From 672bdd094082d5be99b3149269a00f94875d0698 Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 17:20:03 +0000 Subject: Grammar fix --- report/paper.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/report/paper.md b/report/paper.md index 5587da4..c2c1a56 100644 --- a/report/paper.md +++ b/report/paper.md @@ -35,7 +35,7 @@ DCGAN exploits convolutional stride to perform downsampling and transposed convo We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). -The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25. +The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal dropout rate of 0.25. The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`. The main architecture used can be observed in figure \ref{fig:dcganarc}. @@ -82,7 +82,7 @@ Applying Virtual Batch Normalization our Medium DCGAN does not provide observabl \end{figure} We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation -of the droupout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. +of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*. @@ -90,7 +90,7 @@ While training the different proposed DCGAN architectures, we did not observe mo ## CGAN Architecture description -CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer (slope=0.2) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu (slope=0.2) and a Droupout layer. +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN architecture presents a series blocks each contained a dense layer, LeakyReLu layer (slope=0.2) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu (slope=0.2) and a Droupout layer. The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`). The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{fig:cdcganarc}. @@ -117,7 +117,7 @@ We evaluate permutations of the architecture involving: When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}). +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}). The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. @@ -244,7 +244,7 @@ we cannot achieve much better with this very small amount of data, since the val We conduct one experiment, feeding the test set to a LeNet trained exclusively on data generated from our CGAN. It is noticeable that training for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained when training the network with only the few real samples. This indicates that we can use the generated data to train the first steps of the network (initial weights) and apply the real sample for 300 epochs to obtain -a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successfull at +a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successful at improving testing accuracy. \begin{figure} @@ -259,7 +259,7 @@ improving testing accuracy. We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160,000) (learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is -boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. +boosted to 92%, making this technique the most successful attempt of improvement while using a limited amount of data from MNIST dataset. \begin{figure} \begin{center} @@ -307,7 +307,7 @@ TODO EXPLAIN WHAT WE HAVE DONE HERE ## Factoring in classification loss into GAN Classification accuracy and Inception score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. -Nevertheless, a pretrained static classifier may be added to the GAN model, and it's loss incorporated into the loss added too the loss of the gan. +Nevertheless, a pretrained static classifier may be added to the GAN model, and it's loss incorporated into the loss added too the loss of the GAN. $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ -- cgit v1.2.3-54-g00ecf