diff options
Diffstat (limited to 'report/paper.md')
-rw-r--r-- | report/paper.md | 63 |
1 files changed, 37 insertions, 26 deletions
diff --git a/report/paper.md b/report/paper.md index 3ea9b94..1686bc0 100644 --- a/report/paper.md +++ b/report/paper.md @@ -2,8 +2,6 @@ In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. -## GAN - Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with. GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. @@ -23,6 +21,12 @@ Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanill A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). +It is possible to artificially balance the number of steps between G and D backpropagation, however we think with a solid GAN structure this step is not +really needed. Updating D more frequently than G resulted in additional cases of mode collapse due to the vanishing gradient issue. Updating G more +frequently has not proved to be beneficial either, as the discriminator did not learn how to distinguish real samples from fake samples quickly enough. +For this reasons the following sections will not present any artificial balancing of G-D training steps, opting for a standard single step update for both +discriminator and generator. + # DCGAN ## DCGAN Architecture description @@ -62,7 +66,7 @@ We evaluate three different GAN architectures, varying the size of convolutional \end{figure} We observed that the deep architectures result in a more easily achievable equilibria of G-D losses. -Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5.000 batches, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. +Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5,000 batches, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. As DCGAN is trained with no labels, the generator primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes form one another. Therefore we sometimes observe oddly shape fused digits which may temporarily full be labeled real by the discriminator. This issue is solved by training the network for more batches or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. @@ -113,7 +117,7 @@ We evaluate permutations of the architecture involving: When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:clong}. +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry \ref{fig:clong}. The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. @@ -160,7 +164,8 @@ We use the logits extracted from LeNet: $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ -We further report the classification accuracy as found with LeNet. +We further report the classification accuracy as found with LeNet. For coherence purposes the inception scores were +calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with SGD optimizer, learning rate = 0.001). \begin{table}[H] \begin{tabular}{llll} @@ -207,7 +212,7 @@ injecting generated samples in the original training set to boost testing accura As observed in figure \ref{fig:mix1} we performed two experiments for performance evaluation: -* Keeping the same number of training samples while just changing the amount of real to generated data (55.000 samples in total). +* Keeping the same number of training samples while just changing the amount of real to generated data (55,000 samples in total). * Keeping the whole training set from MNIST and adding generated samples from CGAN. \begin{figure} @@ -252,7 +257,7 @@ improving testing accuracy. \end{figure} -We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160.000) +We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160,000) (learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. @@ -285,7 +290,7 @@ TODO EXPLAIN WHAT WE HAVE DONE HERE \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-mnist.png}}\\ \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cgan.png}}\quad \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}} - \caption{Visualisations PCA: a) MNIST c) CGAN | TSNE b) MNIST d) CGAN} + \caption{Visualisations: a)MNIST|PCA b)MNIST|TSNE c)CGAN-gen|PCA d)CGAN-gen|TSNE} \label{fig:features} \end{figure} @@ -314,7 +319,9 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} # Appendix -\begin{figure} +## DCGAN-Appendix + +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/vanilla_gan_arc.pdf} \caption{Vanilla GAN Architecture} @@ -322,7 +329,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/generic_gan_loss.png} \caption{Shallow GAN D-G Loss} @@ -330,7 +337,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/short_dcgan_ex.png} \includegraphics[width=24em]{fig/short_dcgan.png} @@ -339,7 +346,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/long_dcgan_ex.png} \includegraphics[width=24em]{fig/long_dcgan.png} @@ -348,7 +355,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout01_gd.png} \caption{DCGAN Dropout 0.1 G-D Losses} @@ -356,7 +363,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/dcgan_dropout01.png} \caption{DCGAN Dropout 0.1 Generated Images} @@ -364,7 +371,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout05_gd.png} \caption{DCGAN Dropout 0.5 G-D Losses} @@ -372,7 +379,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/dcgan_dropout05.png} \caption{DCGAN Dropout 0.5 Generated Images} @@ -380,7 +387,9 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +## CGAN-Appendix + +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/CDCGAN_arch.pdf} \caption{Deep Convolutional CGAN Architecture} @@ -388,7 +397,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/short_cgan_ex.png} \includegraphics[width=24em]{fig/short_cgan.png} @@ -397,7 +406,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/long_cgan_ex.png} \includegraphics[width=24em]{fig/long_cgan.png} @@ -406,7 +415,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/cgan_dropout01.png} \caption{CGAN Dropout 0.1 G-D Losses} @@ -414,7 +423,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/cgan_dropout01_ex.png} \caption{CGAN Dropout 0.1 Generated Images} @@ -422,7 +431,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/cgan_dropout05.png} \caption{CGAN Dropout 0.5 G-D Losses} @@ -430,7 +439,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/cgan_dropout05_ex.png} \caption{CGAN Dropout 0.5 Generated Images} @@ -438,7 +447,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/good_ex.png} \includegraphics[width=12em]{fig/bad_ex.png} @@ -448,7 +457,9 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +## Retrain-Appendix + +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/fake_only.png} \caption{Retraining with generated samples only} @@ -456,7 +467,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/retrain_fail.png} \caption{Retraining failures} |