diff options
-rw-r--r-- | report/bibliography.bib | 7 | ||||
-rw-r--r-- | report/paper.md | 51 |
2 files changed, 31 insertions, 27 deletions
diff --git a/report/bibliography.bib b/report/bibliography.bib index 430d8b5..e778423 100644 --- a/report/bibliography.bib +++ b/report/bibliography.bib @@ -1,3 +1,10 @@ +@misc{adam, +Author = {Diederik P. Kingma and Jimmy Ba}, +Title = {Adam: A Method for Stochastic Optimization}, +Year = {2014}, +Eprint = {arXiv:1412.6980}, +} + @INPROCEEDINGS{lenet, author = {Yann Lecun and Léon Bottou and Yoshua Bengio and Patrick Haffner}, title = {Gradient-based learning applied to document recognition}, diff --git a/report/paper.md b/report/paper.md index 74a72d3..afcb418 100644 --- a/report/paper.md +++ b/report/paper.md @@ -2,11 +2,11 @@ In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataset and evaluate performance metrics across various optimizations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. -Generative Adversarial Networks present a system of models which learn to output data similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with. +Generative Adversarial Networks represent a system of models characterised by their ability to output data similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with. -GANs employ two neural networks - a *discriminator* and a *generator* which contest in a min-max game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. +GANs employ two neural networks - a *discriminator* and a *generator* which contest in a min-max game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the *generator* is to produce realistic images which are able to fool the *discriminator*. -Training a shallow GAN with no convolutional layers poses problems such as mode collapse and unbalanced G-D losses which lead to low quality image output. +Training a shallow GAN with no convolutional layers exposes problems such as **mode collapse**, and unbalanced *generator-discriminator* losses which lead to **diminishing gradients** and **low quality image output*. \begin{figure} \begin{center} @@ -16,22 +16,21 @@ Training a shallow GAN with no convolutional layers poses problems such as mode \end{center} \end{figure} -Some of the main challanges faced when training a GAN are: **mode collapse**, **low quality** of images and **mismatch** between generator and discriminator loss. Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanilla_gan}) implementation after 200,000 batches. The generated images observed during a mode collapse can be seen in figure \ref{fig:mode_collapse}. The output of the generator only represents few of the labels originally fed. When mode collapse is reached the loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe the discriminator loss tends to zero as the discriminator learns to assume and classify the fake 1s, while the generator is stuck producing 1 and hence not able to improve. +Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanilla_gan}) implementation after 200,000 batches. The generated images observed during a mode collapse can be seen in figure \ref{fig:mode_collapse}. We observe that the output of the generator only represents few of the labels originally fed. When mode collapse is reached the loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe the discriminator loss tends to zero as the discriminator learns to assume and classify the fake one's, while the generator is stuck and hence not able to improve. -An improvement to the vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). +A marked improvement to the vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). # DCGAN ## DCGAN Architecture description -DCGAN exploits convolutional stride to perform downsampling and transposed convolution to perform upsampling. +DCGAN exploits convolutional stride to perform downsampling and transposed convolutions to perform upsampling, in contrast to the fully connected layers in a vanilla GAN. -We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator -and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). -The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal dropout rate of 0.25. +The tested implementation uses batch normalization at the output of each convolutional layer (exceptions being the output layer of the generator and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). +The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding into the next layers. We noticed a significant improvement in performance, and meassured a well performing dropout rate of 0.25. The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`. -The main architecture used can be observed in figure \ref{fig:dcganarc}. +The base architecture used can be observed in figure \ref{fig:dcganarc}. \begin{figure} \begin{center} @@ -43,11 +42,11 @@ The main architecture used can be observed in figure \ref{fig:dcganarc}. ## Tests on MNIST -We evaluate three different GAN architectures, varying the size of convolutional layers in the generator, while retaining the structure presented in figure \ref{fig:dcganarc}: +We evaluate three variants the DCGAN architecture, varying the size of convolutional layers in the generator, while retaining the structure presented in figure \ref{fig:dcganarc}: -* Shallow: Conv128-Conv64 -* Medium: Conv256-Conv128 -* Deep: Conv512-Conv256 +* Shallow: `Conv128-Conv64` +* Medium: `Conv256-Conv128` +* Deep: `Conv512-Conv256` \begin{figure} \begin{center} @@ -61,12 +60,12 @@ We evaluate three different GAN architectures, varying the size of convolutional We observed that the deep architectures result in a more easily achievable equilibria of G-D losses. Our medium depth DCGAN achieves very good performance (figure \ref{fig:dcmed}), balancing both binary cross entropy losses at approximately 0.9 after 5,000 batches, reaching equilibrium quicker and with less oscillation than the Deepest DCGAN tested (figure \ref{fig:dclong}). -As DCGAN is trained with no labels, the generator's primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes from each another. Therefore we sometimes observe oddly shaped digits which may temporarily be labeled as real by the discriminator. This issue is solved by training the network for more batches or introducing a deeper architecture, as it can be deducted from a qualitative comparison +As DCGAN is trained with no labels, the generator's primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes from each another. Therefore we sometimes observe oddly shaped digits which may temporarily be labeled as real by the discriminator. This issue is alleviated by training the network for more batches or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D losses. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. -We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation +We evaluated the effect of different dropout rates (results in Appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilization of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. Trying different parameters for artificial G-D balancing in the training stage did not achieve any significant benefits, @@ -77,17 +76,16 @@ but no mode collapse was observed even with the shallow model. ## CGAN Architecture description -CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The generator's architecture presents a series of blocks, each containing a dense layer, `LeakyReLU` layer (`slope=0.2`) and a Batch Normalization layer. The baseline discriminator uses Dense layers, followed by `LeakyReLU` (`slope=0.2`) and a Droupout layer. -The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`). +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN architecture we evaluate is visible in figure \ref{fig:cganarc}. The generator's architecture presents a series of blocks, each containing a dense layer, `LeakyReLU` layer (`slope=0.2`) and a Batch Normalization layer. The baseline discriminator uses Dense layers, followed by `LeakyReLU` (`slope=0.2`) and a Droupout layer. For training we used the Adam optimizer [@adam] with a learning rate of $0.002$ and $\beta=0.5$. -The architecture of the Deep Convolutional CGAN (cDCGAN) analysed is presented in the Appendix. It uses transpose convolutions with a stride of two to perform upscaling followed by convolutional blocks with singular stride. We find that kernel size of 3 by 3 worked well for all four convolutional blocks which include a Batch Normalization and an Activation layer (`ReLU` for generator and `LeakyReLU` for discriminator). The architecture assessed in this paper uses multiplying layers between the label embedding and the output `ReLU` blocks, as we found that it was more robust compared to the addition of the label embedding via concatenation. Label embedding -is performed with a `Dense+Tanh+Upsampling` block, both in the discriminator and the generator, feeding a 64x28x28 input for the multiplication layers. The output activation layers for generator and discriminator are respectively `tanh` and `sigmoid`. +We also evaluate a Deep Convolutional version of CGAN (cDCGAN), the architecture of which can be found in the Appendix. It uses transpose convolutions with a stride of two to perform upscaling followed by convolutional blocks with singular stride. We find that kernel size of three by three worked well for all four convolutional blocks which include a Batch Normalization and an Activation layer (`ReLU` for generator and `LeakyReLU` for discriminator). The architecture assessed in this paper uses multiplying layers between the label embedding and the output `ReLU` blocks, as we found that it was more robust compared to the addition of the label embedding via concatenation. Label embedding +is performed with a `Dense`,`tanh` and `Upsampling` block, both in the discriminator and the generator, creating a $64\times 28\times 28$ input for the multiplication layers. The output activation layers for generator and discriminator are respectively `tanh` and `sigmoid`. The list of the architecture we evaluate in this report: -* Shallow CGAN - 1 `Dense-LeakyReLU` blocks -* Medium CGAN - 3 `Dense-LeakyReLU` blocks -* Deep CGAN - 5 `Dense-LeakyReLU` blocks +* Shallow CGAN - $1\times$ `Dense-LeakyReLU` blocks +* Medium CGAN - $3\times$ `Dense-LeakyReLU` blocks +* Deep CGAN - $5\times$ `Dense-LeakyReLU` blocks * Deep Convolutional CGAN (cDCGAN) * One-Sided Label Smoothing (LS) * Various Dropout (DO): 0.1, 0.3 and 0.5 @@ -103,10 +101,9 @@ The list of the architecture we evaluate in this report: ## Tests on MNIST -When comparing the three levels of depth for the baseline architecture it is possible to notice significant differences in G-D losses balancing. In -a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't -experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not really high: we can see that even after 20,000 batches some pictures appear to be slightly blurry (figure \ref{fig:clong}). +When comparing the three levels of depth for the baseline architecture it is possible to notice significant differences in G-D loss balancing. In +a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this, for the Dense CGAN we did not experience issues with vanishing gradient, and did not achieve mode collapse. +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not particularly high: we can see that even after 20,000 batches some pictures appear slightly blurry (figure \ref{fig:clong}). The best compromise is reached for `3 Dense-LeakyReLU` blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. |