# Introduction In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataaset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. ## GAN Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and ideally features as the samples it has been trained with. GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse and generating low quality images due to unbalanced G-D losses. Mode collapse can be observed in figure \ref{fig:mode_collapse}, after 200.000 iterations of the GAN network presented in appendix, figure \ref{fig:vanilla_gan} . The output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tentding to zero as it learns ti classify the fake 1's, while the generator is stuck producing 1's. A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). # DCGAN ## DCGAN Architecture description DCGAN exploits convolutional stride to perform downsampling and transposed convolution to perform upsampling. We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25. The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`. The main architecture used can be observed in figure \ref{fig:dcganarc}. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/DCGAN_arch.pdf} \caption{DCGAN Architecture} \label{fig:dcganarc} \end{center} \end{figure} ## Tests on MNIST We propose 3 different architectures, varying the size of convolutional layers in the generator, while retaining the structure proposed in figure \ref{fig:dcganarc}: \begin{itemize} \item Shallow: Conv128-Conv64 \item Medium: Conv256-Conv128 \item Deep: Conv512-Conv256 \end{itemize} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/med_dcgan_ex.pdf} \includegraphics[width=24em]{fig/med_dcgan.png} \caption{Medium DCGAN} \label{fig:dcmed} \end{center} \end{figure} It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance, balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to Deep DCGAN. Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class. Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/vbn_dc.pdf} \caption{DCGAN Virtual Batch Normalization} \label{fig:vbn_dc} \end{center} \end{figure} We evaluated the effect of different dropout rates (results in appendix, figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimization of this parameter is essential to obtain good performance: a high dropout rate would result in DCGAN producing only artifacts that do not really match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate would lead to an initial stabilisation of G-D losses, but it would result into oscillation when training for a large number of epochs. While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than the simple GAN presented in the introduction. # CGAN ## CGAN Architecture description \begin{figure} \begin{center} \includegraphics[width=24em]{fig/CGAN_arch.pdf} \caption{CGAN Architecture} \label{fig:cganarc} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/CDCGAN_arch.pdf} \caption{Deep Convolutional CGAN Architecture} \label{fig:cdcganarc} \end{center} \end{figure} ## Tests on MNIST Try **different architectures, hyper-parameters**, and, if necessary, the aspects of **one-sided label smoothing**, **virtual batch normalization**, balancing G and D. Please perform qualitative analyses on the generated images, and discuss, with results, what challenge and how they are specifically addressing. Is there the **mode collapse issue?** The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, as the images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/med_cgan_ex.pdf} \includegraphics[width=24em]{fig/med_cgan.png} \caption{Medium CGAN} \label{fig:cmed} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/smoothing_ex.pdf} \includegraphics[width=24em]{fig/smoothing.png} \caption{One sided label smoothing} \label{fig:smooth} \end{center} \end{figure} # Inception Score ## Classifier Architecture Used ## Results Measure the inception scores i.e. we use the class labels to generate images in CGAN and compare them with the predicted labels of the generated images. Also report the recognition accuracies on the MNIST real testing set (10K), in comparison to the inception scores. **Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or architectures in Q2.** We measure the performance of the considered GAN's using the Inecption score [-inception], as calculated with L2-Net logits. $$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$ \begin{table}[] \begin{tabular}{llll} & \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline Shallow CGAN & 0.645 & 3.57 & 8:14 \\ Medium CGAN & 0.715 & 3.79 & 10:23 \\ Deep CGAN & 0.739 & 3.85 & 16:27 \\ Convolutional CGAN & 0.737 & 4 & 25:27 \\ \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ *MNIST original & 0.9846 & 9.685 & N/A \end{tabular} \end{table} # Re-training the handwritten digit classifier ## Results In this section we analyze the effect of retraining the classification network using a mix of real and generated data, highlighting the benefits of injecting generated samples in the original training set to boost testing accuracy. As observed in figure \ref{fig:mix1} we performed two experiments for performance evaluation: \begin{itemize} \item Keeping the same number of training samples while just changing the amount of real to generated data (55.000 samples in total). \item Keeping the whole training set from MNIST and adding generated samples from CGAN. \end{itemize} \begin{figure} \begin{center} \includegraphics[width=12em]{fig/mix_zoom.png} \includegraphics[width=12em]{fig/added_generated_data.png} \caption{Mix data, left unchanged samples number, right added samples} \label{fig:mix1} \end{center} \end{figure} Both experiments show that an optimal amount of data to boost testing accuracy on the original MNIST dataset is around 30% generated data as in both cases we observe an increase in accuracy by around 0.3%. In absence of original data the testing accuracy drops significantly to around 20% for both cases. ## Adapted Training Strategy For this section we will use 550 samples from MNIST (55 samples per class). Training the classifier yelds major challanges, since the amount of samples aailable for training is relatively small. Training for 100 epochs, similarly to the previous section, is clearly not enough. The MNIST test set accuracy reached in this case is only 62%, while training for 300 epochs we can reach up to 88%. The learning curve in figure \ref{fig:few_real} suggests we cannot achieve much better whith this very small amount of data, since the validation accuracy flattens, while the training accuracy almost reaches 100%. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/train_few_real.png} \caption{Training with few real samples} \label{fig:few_real} \end{center} \end{figure} We conduct one experiment, feeding the test set to a L2-Net trained exclusively on data generated from our CGAN. It is noticeable that training for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained while training the network ith only the few real samples. This indicates that we can use the generated data to train the first steps of the network (initial weights) and apply the real sample for 300 epochs to obtain a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successfull at improving testing accuracy. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/initialization.png} \caption{Retraining with initialization from generated samples} \label{fig:few_init} \end{center} \end{figure} We try to improve the results obtained earlier by retraining L2-Net with mixed data: few real samples and plenty of generated samples (160.000) (learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. \begin{figure} \begin{center} \includegraphics[width=12em]{fig/training_mixed.png} \includegraphics[width=12em]{fig/fine_tuning.png} \caption{Retraining; Mixed initialization left, fine tuning right} \label{fig:training_mixed} \end{center} \end{figure} Failures classification examples are displayed in figure \ref{fig:retrain_fail}. The results showed indicate that the network we trained is actually performing quite well, as most of the testing images that got misclassified (mainly nines and fours) show ambiguities. # Bonus This is an open question. Do you have any other ideas to improve GANs or have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance, \begin{itemize} \item How do you compare GAN with PCA? We leant PCA as another generative model in the Pattern Recognition module (EE468/EE9SO29/EE9CS729). Strengths/weaknesses? \item Take the pre-trained classification network using 100% real training examples and use it to extract the penultimate layer’s activations (embeddings) of 100 randomly sampled real test examples and 100 randomly sampled synthetic examples from all the digits i.e. 0-9. Use an embedding method e.g. t-sne [1] or PCA, to project them to a 2D subspace and plot them. Explain what kind of patterns do you observe between the digits on real and synthetic data. Also plot the distribution of confidence scores on these real and synthetic sub-sampled examples by the classification network trained on 100% real data on two separate graphs. Explain the trends in the graphs. \item Can we add a classification loss (using the pre-trained classifier) to CGAN, and see if this improve? The classification loss would help the generated images maintain the class labels, i.e. improving the inception score. What would be the respective network architecture and loss function? \end{itemize} # References
\newpage # Appendix \begin{figure} \begin{center} \includegraphics[width=24em]{fig/vanilla_gan_arc.pdf} \caption{Vanilla GAN Architecture} \label{fig:vanilla_gan} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/generic_gan_loss.png} \caption{Shallow GAN D-G Loss} \label{fig:vanilla_loss} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf} \caption{Shallow GAN mode collapse} \label{fig:mode_collapse} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/short_dcgan_ex.pdf} \includegraphics[width=24em]{fig/short_dcgan.png} \caption{Shallow DCGAN} \label{fig:dcshort} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/long_dcgan_ex.pdf} \includegraphics[width=24em]{fig/long_dcgan.png} \caption{Deep DCGAN} \label{fig:dclong} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout01_gd.png} \caption{DCGAN Dropout 0.1 G-D Losses} \label{fig:dcdrop1_1} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=14em]{fig/dcgan_dropout01.png} \caption{DCGAN Dropout 0.1 Generated Images} \label{fig:dcdrop1_2} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout05_gd.png} \caption{DCGAN Dropout 0.5 G-D Losses} \label{fig:dcdrop2_1} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=14em]{fig/dcgan_dropout05.png} \caption{DCGAN Dropout 0.5 Generated Images} \label{fig:dcdrop2_2} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/short_cgan_ex.pdf} \includegraphics[width=24em]{fig/short_cgan.png} \caption{Shallow CGAN} \label{fig:cshort} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/long_cgan_ex.pdf} \includegraphics[width=24em]{fig/long_cgan.png} \caption{Deep CGAN} \label{fig:clong} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/cgan_dropout01.png} \caption{CGAN Dropout 0.1 G-D Losses} \label{fig:cg_drop1_1} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=14em]{fig/cgan_dropout01_ex.png} \caption{CGAN Dropout 0.1 Generated Images} \label{fig:cg_drop1_2} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/cgan_dropout05.png} \caption{CGAN Dropout 0.5 G-D Losses} \label{fig:cg_drop2_1} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=14em]{fig/cgan_dropout05_ex.png} \caption{CGAN Dropout 0.5 Generated Images} \label{fig:cg_drop2_2} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=24em]{fig/fake_only.png} \caption{Retraining with generated samples only} \label{fig:fake_only} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=12em]{fig/retrain_fail.png} \caption{Retraining failures} \label{fig:retrain_fail} \end{center} \end{figure}