diff options
-rw-r--r-- | report/paper.md | 39 |
1 files changed, 19 insertions, 20 deletions
diff --git a/report/paper.md b/report/paper.md index 3917bd9..8ee2eb4 100644 --- a/report/paper.md +++ b/report/paper.md @@ -85,17 +85,17 @@ but no mode collapse was observed even with the shallow model. ## CGAN Architecture description -CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN architecture presents a series of blocks, each containing a dense layer, `LeakyReLu` layer (`slope=0.2`) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by `LeakyReLu` (`slope=0.2`) and a Droupout layer. +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN architecture presents a series of blocks, each containing a dense layer, `LeakyReLU` layer (`slope=0.2`) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by `LeakyReLU` (`slope=0.2`) and a Droupout layer. The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`). -The architecture of the Deep Convolutional CGAN (cDCGAN) analysed is presented in the Appendix. It uses transpose convolutions with a stride of two to perform upscaling followed by convolutional blocks with singular stride. We find that kernel size of 3 by 3 worked well for all four convolutional blocks which include a Batch Normalization and an Activation layer (`ReLu` for generator and `LeakyReLu` for discriminator). The architecture assessed in this paper uses multiplying layers between the label embedding and the output `ReLu` blocks, as we found that it was more robust compared to the addition of the label embedding via concatenation. Label embedding -is performed with a `Dense+Tanh+Upsampling` block, both in the discriminator and the generator, feeding a 64x28x28 input for the multiplication layers. +The architecture of the Deep Convolutional CGAN (cDCGAN) analysed is presented in the Appendix. It uses transpose convolutions with a stride of two to perform upscaling followed by convolutional blocks with singular stride. We find that kernel size of 3 by 3 worked well for all four convolutional blocks which include a Batch Normalization and an Activation layer (`ReLU` for generator and `LeakyReLU` for discriminator). The architecture assessed in this paper uses multiplying layers between the label embedding and the output `ReLU` blocks, as we found that it was more robust compared to the addition of the label embedding via concatenation. Label embedding +is performed with a `Dense+Tanh+Upsampling` block, both in the discriminator and the generator, feeding a 64x28x28 input for the multiplication layers. The output activation layers for generator and discriminator are respectively `tanh` and `sigmoid`. The list of the architecture we evaluate in this report: -* Shallow CGAN - 1 `Dense-LeakyReLu` blocks -* Medium CGAN - 3 `Dense-LeakyReLu` blocks -* Deep CGAN - 5 `Dense-LeakyReLu` blocks +* Shallow CGAN - 1 `Dense-LeakyReLU` blocks +* Medium CGAN - 3 `Dense-LeakyReLU` blocks +* Deep CGAN - 5 `Dense-LeakyReLU` blocks * Deep Convolutional CGAN (cDCGAN) * One-Sided Label Smoothing (LS) * Various Dropout (DO): 0.1, 0.3 and 0.5 @@ -115,7 +115,7 @@ When comparing the three levels of depth for the baseline architecture it is pos a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not really high: we can see that even after 20,000 batches some pictures appear to be slightly blurry (figure \ref{fig:clong}). -The best compromise is reached for `3 Dense-LeakyReLu` blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1. +The best compromise is reached for `3 Dense-LeakyReLU` blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. \begin{figure} @@ -182,7 +182,7 @@ Oscillation on the generator loss is noticeable in figure \ref{fig:cdcloss} due adjustment to tackle this issue was balancing G-D training steps, opting for G/D=3, allowing the generator to gain some advantage over the discriminator. This technique allowed to smooth oscillation while producing images of similar quality. Using G/D=6 dampens oscillation almost completely leading to the vanishing discriminator's gradient issue. Mode collapse occurs in this specific case as shown on -figure \ref{fig:cdccollapse}. Checking the embeddings extracted from a pretrained LeNet classifier (figure \ref{fig:clustcollapse}) we observe low diversity between features of each class, that +figure \ref{fig:cdccollapse}. Checking the PCA embeddings extracted from a pretrained LeNet classifier (figure \ref{fig:clustcollapse}) we observe low diversity between features of each class, that tend to collapse to very small regions. \begin{figure} @@ -335,7 +335,7 @@ the overall performance. ## Relation to PCA -Similarly to GANs, PCA can be used to formulate **generative** models of a system. While GANs are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. Both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), but PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would indeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. +Similarly to GANs, PCA can be used to formulate **generative** models of a system. While GANs are trained neural networks, PCA is a definite statistical procedure which performs orthogonal transformations of the data. Both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), but PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would need to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. ## Data representation @@ -349,6 +349,15 @@ specific model. On the other hand, with non cDCGAN we notice higher correlation for the different classes, meaning that a good data separation was not achieved. This is probably due to the additional blur produced around the images with our simple CGAN model. +We have presented the Precision Recall Curve for the MNIST, against that of a Dense CGAN and Convolutional CGAN. While the superior performance of the convolutional GAN is evident, it is interesting to note that the precision curves are similar, specifically the numbers 8 and 9. For both architectures 9 is the worst digit on average, but for higher Recall we find that there is a smaller proportion of extremely poor 8's, which result in lower the digit to the poorest precision. + +## Factoring in classification loss into GAN + +Classification accuracy and Inception score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. +Nevertheless, a pre-trained static classifier may be added to the GAN model, and its loss incorporated into the loss added too the loss of the GAN. + +$$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ + \begin{figure} \centering \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-mnist.png}}\quad @@ -361,8 +370,6 @@ produced around the images with our simple CGAN model. \label{fig:features} \end{figure} -We have presented the Precision Recall Curve for the MNIST, against that of a Dense CGAN and Convolutional CGAN. While the superior performance of the convolutional GAN is evident, it is interesting to note that the precision curves are similar, specifically the numbers 8 and 9. For both architectures 9 is the worst digit on average, but for higher Recall we find that there is a smaller proportion of extremely poor 8's, which result in lower the digit to the poorest precision. - \begin{figure} \centering \subfloat[][]{\includegraphics[width=.22\textwidth]{fig/pr-mnist.png}}\quad @@ -372,14 +379,6 @@ We have presented the Precision Recall Curve for the MNIST, against that of a De \label{fig:rocpr} \end{figure} -## Factoring in classification loss into GAN - -Classification accuracy and Inception score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. -Nevertheless, a pre-trained static classifier may be added to the GAN model, and its loss incorporated into the loss added too the loss of the GAN. - -$$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ - - # References <div id="refs"></div> @@ -517,7 +516,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/clustcollapse.png} -\caption{cDCGAN G/D=6 Embeddings through LeNet} +\caption{cDCGAN G/D=6 PCA Embeddings through LeNet (10000 samples per class)} \label{fig:clustcollapse} \end{center} \end{figure} |