diff options
-rw-r--r-- | report/paper.md | 16 |
1 files changed, 5 insertions, 11 deletions
diff --git a/report/paper.md b/report/paper.md index ee4a626..2ba5401 100644 --- a/report/paper.md +++ b/report/paper.md @@ -4,13 +4,13 @@ In this coursework we present two variants of the GAN architecture - DCGAN and C Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with. -GANs employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. +GANs employ two neural networks - a *discriminator* and a *generator* which contest in a min-max game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. Training a shallow GAN with no convolutional layers poses problems such as mode collapse and unbalanced G-D losses which lead to low quality image output. \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf} +\includegraphics[width=16em]{fig/generic_gan_mode_collapse.pdf} \caption{Vanilla GAN mode collapse} \label{fig:mode_collapse} \end{center} @@ -20,10 +20,6 @@ Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanill A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). -It is possible to artificially balance the number of steps between G and D backpropagation, however we think with a solid GAN structure this step is not -really needed. Updating D more frequently than G resulted in additional cases of mode collapse due to the vanishing gradient issue. Updating G more -frequently has not proved to be beneficial either, as the discriminator did not learn how to distinguish real samples from fake samples quickly enough. - # DCGAN ## DCGAN Architecture description @@ -68,12 +64,12 @@ Our medium depth DCGAN achieves very good performance, balancing both binary cro As DCGAN is trained with no labels, the generator primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes form one another. Therefore we sometimes observe oddly shape fused digits which may temporarily full be labeled real by the discriminator. This issue is solved by training the network for more batches or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. -Applying Virtual Batch Normalization our Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. +Applying Virtual Batch Normalization our Medium DCGAN does not provide observable changes in G-D losses, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. -Trying different parameters for artificial G-D balancing in the training stage did not achieve any significant benefits as discussed in section I, +Trying different parameters for artificial G-D balancing in the training stage did not achieve any significant benefits, exclusively leading to the generation of more artifacts (figure \ref{fig:baldc}). We also attempted to increase the D training steps with respect to G, but no mode collapse was observed even with the shallow model. @@ -85,8 +81,6 @@ but no mode collapse was observed even with the shallow model. \end{center} \end{figure} -While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*. - # CGAN ## CGAN Architecture description @@ -168,7 +162,7 @@ the same classes, indicating that mode collapse still did not occur. The best performing architecture was CDCGAN. It is difficult to assess any potential improvement at this stage, since the samples produced between 8,000 and 13,000 batches are indistinguishable from the ones of the MNIST dataset (as it can be seen in figure \ref{fig:cdc}, middle). Training CDCGAN for more than -15,000 batches is however not beneficial, as the discriminator will almost reach a loss of zero, leading the generator to oscillate and produce bad samples as shown in the reported example. +15,000 batches is however not beneficial, as the discriminator will keep improving, leading the generator to oscillate and produce bad samples as shown in the reported example. We find a good balance for 12,000 batches. \begin{figure} |