aboutsummaryrefslogtreecommitdiff
path: root/report/paper.md
diff options
context:
space:
mode:
Diffstat (limited to 'report/paper.md')
-rw-r--r--report/paper.md28
1 files changed, 14 insertions, 14 deletions
diff --git a/report/paper.md b/report/paper.md
index 7b86846..74a72d3 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -1,6 +1,6 @@
# Introduction
-In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits.
+In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataset and evaluate performance metrics across various optimizations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits.
Generative Adversarial Networks present a system of models which learn to output data similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with.
@@ -67,7 +67,7 @@ between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}.
Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D losses. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique.
We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation
-of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches.
+of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilization of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches.
Trying different parameters for artificial G-D balancing in the training stage did not achieve any significant benefits,
exclusively leading to the generation of more artifacts (figure \ref{fig:baldc}). We also attempted to increase the D training steps with respect to G,
@@ -77,7 +77,7 @@ but no mode collapse was observed even with the shallow model.
## CGAN Architecture description
-CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The generator's architecture presents a series of blocks, each containing a dense layer, `LeakyReLU` layer (`slope=0.2`) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by `LeakyReLU` (`slope=0.2`) and a Droupout layer.
+CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The generator's architecture presents a series of blocks, each containing a dense layer, `LeakyReLU` layer (`slope=0.2`) and a Batch Normalization layer. The baseline discriminator uses Dense layers, followed by `LeakyReLU` (`slope=0.2`) and a Droupout layer.
The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`).
The architecture of the Deep Convolutional CGAN (cDCGAN) analysed is presented in the Appendix. It uses transpose convolutions with a stride of two to perform upscaling followed by convolutional blocks with singular stride. We find that kernel size of 3 by 3 worked well for all four convolutional blocks which include a Batch Normalization and an Activation layer (`ReLU` for generator and `LeakyReLU` for discriminator). The architecture assessed in this paper uses multiplying layers between the label embedding and the output `ReLU` blocks, as we found that it was more robust compared to the addition of the label embedding via concatenation. Label embedding
@@ -91,7 +91,7 @@ The list of the architecture we evaluate in this report:
* Deep Convolutional CGAN (cDCGAN)
* One-Sided Label Smoothing (LS)
* Various Dropout (DO): 0.1, 0.3 and 0.5
-* Virtual Batch Normalisation (VBN) - Normalisation based on one batch(BN) [@improved]
+* Virtual Batch Normalization (VBN) - Normalization based on one batch(BN) [@improved]
\begin{figure}
\begin{center}
@@ -134,7 +134,7 @@ Performance results for one-sided labels smoothing with `true_labels = 0.9` are
\end{center}
\end{figure}
-Virtual Batch normalization provides results that are difficult to qualitatively assess when compared to the ones obtained through the baseline.
+Virtual Batch Normalization provides results that are difficult to qualitatively assess when compared to the ones obtained through the baseline.
Applying this technique to Medium CGAN keeps G-D losses
mostly unchanged. The biggest change we expect to see is a lower dependence of the output on the individual batches. We expect this aspect to mostly affect
performance when training a classifier with the generated images from CGAN, as we will generate more robust output samples. Training with a larger batch size
@@ -205,12 +205,12 @@ be performed in the next section to state which ones are better (through Incepti
# Inception Score
-Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the inception score.
+Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the Inception score.
We use the logits extracted from LeNet:
$$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$
-We further report the classification accuracy as found with LeNet. For coherence purposes the inception scores were
+We further report the classification accuracy as found with LeNet. For coherence purposes the Inception Scores were
calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with `SGD`, `learning rate=0.001`).
\begin{table}[H]
@@ -240,20 +240,20 @@ We observe increased accruacy as we increase the depth of the GAN arhitecture at
### One Side Label Smoothing
-One sided label smoothing involves relaxing our confidence on the labels in our data. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy in the case of our baseline (Medium CGAN). This technique however did not improve the performance of cDCGAN any further, suggesting that reinforcing discriminator behaviour does not benefit the system in this case.
+One sided label smoothing involves relaxing our confidence on data labels. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception Score and classification accuracy in the case of our baseline (Medium CGAN). This technique however did not improve the performance of cDCGAN any further, suggesting that reinforcing discriminator behaviour does not benefit the system in this case.
-### Virtual Batch Normalisation
+### Virtual Batch Normalization
-Virtual Batch Normalisation is a further optimisation technique proposed by Tim Salimans et. al. [@improved]. Virtual batch normalisation is a modification to the batch normalisation layer, which performs normalisation based on statistics from a reference batch. We observe that VBN improved the classification accuracy and the Inception score due to the provided reduction in output dependency from the individual batches, ultimately resulting in a higher samples' quality.
+Virtual Batch Normalization is a further optimisation technique proposed by Tim Salimans et. al. [@improved]. Virtual batch normalization is a modification to the batch normalization layer, which performs normalization based on statistics from a reference batch. We observe that VBN improved the classification accuracy and the Inception Score due to the provided reduction in output dependency from the individual batches, ultimately resulting in a higher samples' quality.
### Dropout
-Despite the difficulties in judging differences between G-D losses and image quality, dropout rate seems to have a noticeable effect on accuracy and inception score, with a variation of 3.6% between our best and worst dropout cases. Ultimately, judging from the measurements, it is preferable to use a low dropout rate (0.1 seems to be the one that achieves the best results).
+Despite the difficulties in judging differences between G-D losses and image quality, dropout rate seems to have a noticeable effect on accuracy and Inception Score, with a variation of 3.6% between our best and worst dropout cases. Ultimately, judging from the measurements, it is preferable to use a low dropout rate (0.1 seems to be the one that achieves the best results).
### G-D Balancing on cDCGAN
Despite achieving lower losses oscillation, using G/D=3 to incentivize generator training did not improve the performance of cDCGAN as it is observed from
-the inception score and testing accuracy. We obtain in fact 5% less test accuracy, meaning that using this technique in our architecture produces on
+the Inception Score and testing accuracy. We obtain in fact 5% less test accuracy, meaning that using this technique in our architecture produces on
average lower quality images when compared to our standard cDCGAN.
# Re-training the handwritten digit classifier
@@ -335,7 +335,7 @@ Using the pre-trained classification on real training examples we extract embedd
test examples and 10,000 randomly sampled synthetic examples using both CGAN and cDCGAN from the different classes.
We obtain both a PCA and TSNE representation of our data on two dimensions in figure \ref{fig:features}.
-It is observable that the network that achieved a good inception score (cDCGAN) produces embeddings that are very similar
+It is observable that the network that achieved a good Inception Score (cDCGAN) produces embeddings that are very similar
to the ones obtained from the original MNIST dataset, further strengthening our hypothesis about the performance of this
specific model. On the other hand, with non cDCGAN we notice higher correlation between the two represented features
for the different classes, meaning that a good data separation was not achieved. This is probably due to the additional blur
@@ -345,7 +345,7 @@ We have presented the Precision Recall Curve for the MNIST, against that of a De
## Factoring in classification loss into GAN
-Classification accuracy and Inception score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note].
+Classification accuracy and Inception Score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the Inception Score to over 900, and note that directly optimising for maximised Inception Score produces adversarial examples [@inception-note].
Nevertheless, a pre-trained static classifier may be added to the GAN model, and its loss incorporated into the loss added too the loss of the GAN.
$$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} $$