report/paper.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256

# Introduction 

In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataaset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits.

## GAN
Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and ideally features as the samples it has been trained with.

GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator.

### Mode Collapse

Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse and generating low quality images due to unbalanced G-D losses.

Mode collapse can be observed in figure \ref{fig:mode_collapse}, after 200.000 iterations of the GAN network presented in appendix, figure \ref{fig:vanilla_gan} . The output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops 
improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tentding to zero as it learns ti classify the fake 1's, while the generator is stuck producing 1's.

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/generic_gan_loss.png}
\caption{Shallow GAN D-G Loss}
\label{fig:vanilla_loss}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf}
\caption{Shallow GAN mode collapse}
\label{fig:mode_collapse}
\end{center}
\end{figure}

A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN).

# DCGAN

## DCGAN Architecture description

DCGAN exploits convolutional stride to perform downsampling and transposed convolution to perform upsampling. 

We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator 
and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator).
The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in
the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25.
The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`.

The main architecture used can be observed in figure \ref{fig:dcganarc}.

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/DCGAN_arch.pdf}
\caption{DCGAN Architecture}
\label{fig:dcganarc}
\end{center}
\end{figure}

## Tests on MNIST

We propose 3 different architectures, varying the size of convolutional layers in the generator, while retaining the structure proposed in figure \ref{fig:dcganarc}: 

\begin{itemize}
\item Shallow: Conv128-Conv64
\item Medium: Conv256-Conv128
\item Deep: Conv512-Conv256
\end{itemize}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/short_dcgan_ex.pdf}
\includegraphics[width=24em]{fig/short_dcgan.png}
\caption{Shallow DCGAN}
\label{fig:dcshort}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/med_dcgan_ex.pdf}
\includegraphics[width=24em]{fig/med_dcgan.png}
\caption{Medium DCGAN}
\label{fig:dcmed}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/long_dcgan_ex.pdf}
\includegraphics[width=24em]{fig/long_dcgan.png}
\caption{Deep DCGAN}
\label{fig:dclong}
\end{center}
\end{figure}

It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance,
balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to
Deep DCGAN.

Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class.
Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This
specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison
between figures \ref{fig:dcshort}, \ref{fig:dcmed} and \ref{fig:dclong}.

Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it 
is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique.

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/vbn_dc.pdf}
\caption{DCGAN Virtual Batch Normalization}
\label{fig:vbn_dc}
\end{center}
\end{figure}

We evaluated the effect of different dropout rates (results in appendix, figures \ref{dcdrop1_1}, \ref{dcdrop1_2}, \ref{dcdrop2_1}, \ref{dcdrop2_2}) and concluded that the optimization
of this parameter is essential to obtain good performance: a high dropout rate would result in DCGAN producing only artifacts that do not really match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate would lead to an initial stabilisation of G-D losses, but it would result into oscillation when training for a large number of epochs.

While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than
the simple GAN presented in the introduction.

# CGAN

## CGAN Architecture description

## Tests on MNIST

Try **different architectures, hyper-parameters**, and, if necessary, the aspects of **one-sided label
smoothing**, **virtual batch normalization**, balancing G and D.
Please perform qualitative analyses on the generated images, and discuss, with results, what
challenge and how they are specifically addressing. Is there the **mode collapse issue?**

# Inception Score


## Classifier Architecture Used

## Results

Measure the inception scores i.e. we use the class labels to
generate images in CGAN and compare them with the predicted labels of the generated images.

Also report the recognition accuracies on the
MNIST real testing set (10K), in comparison to the inception scores.

**Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or
architectures in Q2.**

We measure the performance of the considered GAN's using the Inecption score [-inception], as calculated
with L2-Net logits.

$$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$

GAN type     Inception Score (L2-Net) 	Test Accuracy (L2-Net)
MNIST(ref)   9.67  			1%
cGAN         6.01			2%
cGAN+VB      6.2			3%
cGAN+LS      6.3			.
cGAN+VB+LS   6.4			.
cDCGAN+VB    6.5			.
cDCGAN+LS    6.8			.
cDCGAN+VB+LS 7.3			.


# Re-training the handwritten digit classifier

## Results

Retrain with different portions and test BOTH fake and real queries. Please **vary** the portions
of the real training and synthetic images, e.g. 10%, 20%, 50%, and 100%, of each.

## Adapted Training Strategy

*Using even a small number of real samples per class would already give a high recognition rate,
which is difficult to improve. Use few real samples per class, and, plenty generated images in a
good quality and see if the testing accuracy can be improved or not, over the model trained using
the few real samples only.
Did you have to change the strategy in training the classification network in order to improve the
testing accuracy? For example, use synthetic data to initialise the network parameters followed
by fine tuning the parameters with real data set. Or using realistic synthetic data based on the
confidence score from the classification network pre-trained on real data. If yes, please then
specify your training strategy in details.
Analyse and discuss the outcome of the experimental result.*

# Bonus

This is an open question. Do you have any other ideas to improve GANs or
have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance,

\begin{itemize}

\item How do you compare GAN with PCA? We leant PCA as another generative model in the
Pattern Recognition module (EE468/EE9SO29/EE9CS729). Strengths/weaknesses?

\item Take the pre-trained classification network using 100% real training examples and use it
to extract the penultimate layer’s activations (embeddings) of 100 randomly sampled real
test examples and 100 randomly sampled synthetic examples from all the digits i.e. 0-9.
Use an embedding method e.g. t-sne [1] or PCA, to project them to a 2D subspace and
plot them. Explain what kind of patterns do you observe between the digits on real and
synthetic data. Also plot the distribution of confidence scores on these real and synthetic
sub-sampled examples by the classification network trained on 100% real data on two
separate graphs. Explain the trends in the graphs.

\item Can we add a classification loss (using the pre-trained classifier) to CGAN, and see if this
improve? The classification loss would help the generated images maintain the class 
labels, i.e. improving the inception score. What would be the respective network
architecture and loss function? 

\end{itemize}

# References

<div id="refs"></div>

\newpage

# Appendix 

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/vanilla_gan_arc.pdf}
\caption{Vanilla GAN Architecture}
\label{fig:vanilla_gan}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/dcgan_dropout01_gd.png}
\caption{DCGAN Dropout 0.1 G-D Losses}
\label{fig:dcdrop1_1}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=14em]{fig/dcgan_dropout01.png}
\caption{DCGAN Dropout 0.1 Generated Images}
\label{fig:dcdrop1_2}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/dcgan_dropout05_gd.png}
\caption{DCGAN Dropout 0.5 G-D Losses}
\label{fig:dcdrop2_1}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=14em]{fig/dcgan_dropout05.png}
\caption{DCGAN Dropout 0.5 Generated Images}
\label{fig:dcdrop2_2}
\end{center}
\end{figure}