From a4181df0b5cd0cea323139e7407b8fe1b7d0ad73 Mon Sep 17 00:00:00 2001 From: nunzip Date: Thu, 7 Mar 2019 19:43:39 +0000 Subject: Writing more DCGAN --- report/paper.md | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 100887f..371cd7f 100644 --- a/report/paper.md +++ b/report/paper.md @@ -11,7 +11,7 @@ GAN's employ two neural networks - a *discriminator* and a *generator* which con Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse and generating low quality images due to unbalanced G-D losses. -Mode collapse can be observed in figure \ref{fig:mode_collapse}, after 200.000 iterations of the GAN network **presented in appendix XXX**. The output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops +Mode collapse can be observed in figure \ref{fig:mode_collapse}, after 200.000 iterations of the GAN network presented in appendix, figure \ref{fig:vanilla_gan} . The output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tentding to zero as it learns ti classify the fake 1's, while the generator is stuck producing 1's. \begin{figure} @@ -104,7 +104,17 @@ While training the different proposed DCGAN architectures, we did not observe mo the simple GAN presented in the introduction. Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it -is difficult to qualitatively assess the improvements, figure \ref{fig:} shows results of the introduction of this technique. +is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/vbn_dc.pdf} +\caption{DCGAN Virtual Batch Normalization} +\label{fig:vbn_dc} +\end{center} +\end{figure} + + # CGAN @@ -138,15 +148,15 @@ with L2-Net logits. $$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$ -GAN type Inception Score (L2-Net) -MNIST(ref) 9.67 -cGAN 6.01 -cGAN+VB 6.2 -cGAN+LS 6.3 -cGAN+VB+LS 6.4 -cDCGAN+VB 6.5 -cDCGAN+LS 6.8 -cDCGAN+VB+LS 7.3 +GAN type Inception Score (L2-Net) Test Accuracy (L2-Net) +MNIST(ref) 9.67 1% +cGAN 6.01 2% +cGAN+VB 6.2 3% +cGAN+LS 6.3 . +cGAN+VB+LS 6.4 . +cDCGAN+VB 6.5 . +cDCGAN+LS 6.8 . +cDCGAN+VB+LS 7.3 . @@ -204,4 +214,12 @@ architecture and loss function? # Appendix +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/vanilla_gan_arc.pdf} +\caption{Vanilla GAN Architecture} +\label{fig:vanilla_gan} +\end{center} +\end{figure} + -- cgit v1.2.3-54-g00ecf From 625a86af5f3bd63f5dccbb256eb3b3849cba9da6 Mon Sep 17 00:00:00 2001 From: nunzip Date: Thu, 7 Mar 2019 21:20:11 +0000 Subject: Fix DCGAN --- report/paper.md | 37 ++++++++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 371cd7f..02a689b 100644 --- a/report/paper.md +++ b/report/paper.md @@ -100,9 +100,6 @@ Examples of this can be observed for all the output groups reported above as som specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcshort}, \ref{fig:dcmed} and \ref{fig:dclong}. -While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than -the simple GAN presented in the introduction. - Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. @@ -114,7 +111,11 @@ is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} s \end{center} \end{figure} +We evaluated the effect of different dropout rates (results in appendix, figures \ref{dcdrop1_1}, \ref{dcdrop1_2}, \ref{dcdrop2_1}, \ref{dcdrop2_2}) and concluded that the optimization +of this parameter is essential to obtain good performance: a high dropout rate would result in DCGAN producing only artifacts that do not really match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate would lead to an initial stabilisation of G-D losses, but it would result into oscillation when training for a large number of epochs. +While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than +the simple GAN presented in the introduction. # CGAN @@ -222,4 +223,34 @@ architecture and loss function? \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/dcgan_dropout01_gd.png} +\caption{DCGAN Dropout 0.1 G-D Losses} +\label{fig:dcdrop1_1} +\end{center} +\end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=14em]{fig/dcgan_dropout01.png} +\caption{DCGAN Dropout 0.1 Generated Images} +\label{fig:dcdrop1_2} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/dcgan_dropout05_gd.png} +\caption{DCGAN Dropout 0.5 G-D Losses} +\label{fig:dcdrop2_1} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=14em]{fig/dcgan_dropout05.png} +\caption{DCGAN Dropout 0.5 Generated Images} +\label{fig:dcdrop2_2} +\end{center} +\end{figure} -- cgit v1.2.3-54-g00ecf From d9026b814a09348ea59bee73f09c4095c04d61cb Mon Sep 17 00:00:00 2001 From: nunzip Date: Thu, 7 Mar 2019 22:21:34 +0000 Subject: Add cgan dropout --- report/fig/cgan_dropout01.png | Bin 0 -> 19085 bytes report/fig/cgan_dropout01_ex.png | Bin 0 -> 14640 bytes report/fig/cgan_dropout05.png | Bin 0 -> 20612 bytes report/fig/cgan_dropout05_ex.png | Bin 0 -> 14018 bytes report/paper.md | 109 +++++++++++++++++++++++++-------------- 5 files changed, 71 insertions(+), 38 deletions(-) create mode 100644 report/fig/cgan_dropout01.png create mode 100644 report/fig/cgan_dropout01_ex.png create mode 100644 report/fig/cgan_dropout05.png create mode 100644 report/fig/cgan_dropout05_ex.png (limited to 'report/paper.md') diff --git a/report/fig/cgan_dropout01.png b/report/fig/cgan_dropout01.png new file mode 100644 index 0000000..450deaf Binary files /dev/null and b/report/fig/cgan_dropout01.png differ diff --git a/report/fig/cgan_dropout01_ex.png b/report/fig/cgan_dropout01_ex.png new file mode 100644 index 0000000..2bbf777 Binary files /dev/null and b/report/fig/cgan_dropout01_ex.png differ diff --git a/report/fig/cgan_dropout05.png b/report/fig/cgan_dropout05.png new file mode 100644 index 0000000..0fe282f Binary files /dev/null and b/report/fig/cgan_dropout05.png differ diff --git a/report/fig/cgan_dropout05_ex.png b/report/fig/cgan_dropout05_ex.png new file mode 100644 index 0000000..b9f83fd Binary files /dev/null and b/report/fig/cgan_dropout05_ex.png differ diff --git a/report/paper.md b/report/paper.md index 02a689b..7a26e55 100644 --- a/report/paper.md +++ b/report/paper.md @@ -7,29 +7,11 @@ Generative Adversarial Networks present a system of models which learn to output GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. -### Mode Collapse - Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse and generating low quality images due to unbalanced G-D losses. Mode collapse can be observed in figure \ref{fig:mode_collapse}, after 200.000 iterations of the GAN network presented in appendix, figure \ref{fig:vanilla_gan} . The output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tentding to zero as it learns ti classify the fake 1's, while the generator is stuck producing 1's. -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/generic_gan_loss.png} -\caption{Shallow GAN D-G Loss} -\label{fig:vanilla_loss} -\end{center} -\end{figure} - -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf} -\caption{Shallow GAN mode collapse} -\label{fig:mode_collapse} -\end{center} -\end{figure} - A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). # DCGAN @@ -64,15 +46,6 @@ We propose 3 different architectures, varying the size of convolutional layers i \item Deep: Conv512-Conv256 \end{itemize} -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/short_dcgan_ex.pdf} -\includegraphics[width=24em]{fig/short_dcgan.png} -\caption{Shallow DCGAN} -\label{fig:dcshort} -\end{center} -\end{figure} - \begin{figure} \begin{center} \includegraphics[width=24em]{fig/med_dcgan_ex.pdf} @@ -82,15 +55,6 @@ We propose 3 different architectures, varying the size of convolutional layers i \end{center} \end{figure} -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/long_dcgan_ex.pdf} -\includegraphics[width=24em]{fig/long_dcgan.png} -\caption{Deep DCGAN} -\label{fig:dclong} -\end{center} -\end{figure} - It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance, balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to Deep DCGAN. @@ -98,7 +62,7 @@ Deep DCGAN. Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class. Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison -between figures \ref{fig:dcshort}, \ref{fig:dcmed} and \ref{fig:dclong}. +between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. @@ -111,7 +75,7 @@ is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} s \end{center} \end{figure} -We evaluated the effect of different dropout rates (results in appendix, figures \ref{dcdrop1_1}, \ref{dcdrop1_2}, \ref{dcdrop2_1}, \ref{dcdrop2_2}) and concluded that the optimization +We evaluated the effect of different dropout rates (results in appendix, figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimization of this parameter is essential to obtain good performance: a high dropout rate would result in DCGAN producing only artifacts that do not really match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate would lead to an initial stabilisation of G-D losses, but it would result into oscillation when training for a large number of epochs. While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than @@ -128,6 +92,8 @@ smoothing**, **virtual batch normalization**, balancing G and D. Please perform qualitative analyses on the generated images, and discuss, with results, what challenge and how they are specifically addressing. Is there the **mode collapse issue?** +The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, as the images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. + # Inception Score @@ -223,6 +189,22 @@ architecture and loss function? \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/generic_gan_loss.png} +\caption{Shallow GAN D-G Loss} +\label{fig:vanilla_loss} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf} +\caption{Shallow GAN mode collapse} +\label{fig:mode_collapse} +\end{center} +\end{figure} + \begin{figure} \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout01_gd.png} @@ -254,3 +236,54 @@ architecture and loss function? \label{fig:dcdrop2_2} \end{center} \end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/short_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/short_dcgan.png} +\caption{Shallow DCGAN} +\label{fig:dcshort} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/long_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/long_dcgan.png} +\caption{Deep DCGAN} +\label{fig:dclong} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/cgan_dropout01.png} +\caption{CGAN Dropout 0.1 G-D Losses} +\label{fig:cg_drop1_1} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=14em]{fig/cgan_dropout01_ex.png} +\caption{CGAN Dropout 0.1 Generated Images} +\label{fig:cg_drop1_2} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/cgan_dropout05.png} +\caption{CGAN Dropout 0.5 G-D Losses} +\label{fig:cg_drop2_1} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=14em]{fig/cgan_dropout05_ex.png} +\caption{CGAN Dropout 0.5 Generated Images} +\label{fig:cg_drop2_2} +\end{center} +\end{figure} + -- cgit v1.2.3-54-g00ecf From 3fef722ed752d2369d62c893cc0c4d610a04921a Mon Sep 17 00:00:00 2001 From: nunzip Date: Thu, 7 Mar 2019 23:59:26 +0000 Subject: Fix pictures --- report/paper.md | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 86 insertions(+), 9 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 7a26e55..f079f95 100644 --- a/report/paper.md +++ b/report/paper.md @@ -3,6 +3,7 @@ In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataaset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. ## GAN + Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and ideally features as the samples it has been trained with. GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. @@ -85,6 +86,23 @@ the simple GAN presented in the introduction. ## CGAN Architecture description +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/CGAN_arch.pdf} +\caption{CGAN Architecture} +\label{fig:cganarc} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/CDCGAN_arch.pdf} +\caption{Deep Convolutional CGAN Architecture} +\label{fig:cdcganarc} +\end{center} +\end{figure} + + ## Tests on MNIST Try **different architectures, hyper-parameters**, and, if necessary, the aspects of **one-sided label @@ -94,8 +112,25 @@ challenge and how they are specifically addressing. Is there the **mode collapse The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, as the images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. -# Inception Score +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/med_cgan_ex.pdf} +\includegraphics[width=24em]{fig/med_cgan.png} +\caption{Medium CGAN} +\label{fig:cmed} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/smoothing_ex.pdf} +\includegraphics[width=24em]{fig/smoothing.png} +\caption{One sided label smoothing} +\label{fig:smooth} +\end{center} +\end{figure} +# Inception Score ## Classifier Architecture Used @@ -134,6 +169,30 @@ cDCGAN+VB+LS 7.3 . Retrain with different portions and test BOTH fake and real queries. Please **vary** the portions of the real training and synthetic images, e.g. 10%, 20%, 50%, and 100%, of each. +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/mix.png} +\caption{Mix training} +\label{fig:mix1} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/mix_zoom.png} +\caption{Mix training zoom} +\label{fig:mix2} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/mix_scores.png} +\caption{Mix training scores} +\label{fig:mix3} +\end{center} +\end{figure} + ## Adapted Training Strategy *Using even a small number of real samples per class would already give a high recognition rate, @@ -205,6 +264,24 @@ architecture and loss function? \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/short_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/short_dcgan.png} +\caption{Shallow DCGAN} +\label{fig:dcshort} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/long_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/long_dcgan.png} +\caption{Deep DCGAN} +\label{fig:dclong} +\end{center} +\end{figure} + \begin{figure} \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout01_gd.png} @@ -239,19 +316,19 @@ architecture and loss function? \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/short_dcgan_ex.pdf} -\includegraphics[width=24em]{fig/short_dcgan.png} -\caption{Shallow DCGAN} -\label{fig:dcshort} +\includegraphics[width=24em]{fig/short_cgan_ex.pdf} +\includegraphics[width=24em]{fig/short_cgan.png} +\caption{Shallow CGAN} +\label{fig:cshort} \end{center} \end{figure} \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/long_dcgan_ex.pdf} -\includegraphics[width=24em]{fig/long_dcgan.png} -\caption{Deep DCGAN} -\label{fig:dclong} +\includegraphics[width=24em]{fig/long_cgan_ex.pdf} +\includegraphics[width=24em]{fig/long_cgan.png} +\caption{Deep CGAN} +\label{fig:clong} \end{center} \end{figure} -- cgit v1.2.3-54-g00ecf From 1ea8da8eef6b424794c01b9ebb23bd674ed90b20 Mon Sep 17 00:00:00 2001 From: nunzip Date: Fri, 8 Mar 2019 10:37:52 +0000 Subject: Add complete inception table --- report/paper.md | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index f079f95..54f25db 100644 --- a/report/paper.md +++ b/report/paper.md @@ -150,17 +150,22 @@ with L2-Net logits. $$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$ -GAN type Inception Score (L2-Net) Test Accuracy (L2-Net) -MNIST(ref) 9.67 1% -cGAN 6.01 2% -cGAN+VB 6.2 3% -cGAN+LS 6.3 . -cGAN+VB+LS 6.4 . -cDCGAN+VB 6.5 . -cDCGAN+LS 6.8 . -cDCGAN+VB+LS 7.3 . - - +\begin{table}[] +\begin{tabular}{lll} + & \begin{tabular}[c]{@{}l@{}}Test Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception Score \\ (L2-Net)\end{tabular} \\ \hline + Shallow CGAN & 0.7031 & 5.8 \\ + Medium CGAN & 0.7837 & 6.09 \\ + Deep CGAN & 0.8038 & 6.347 \\ + Convolutional CGAN & 0.7714 & 6.219 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label smoothing\end{tabular} & 0.8268 & 6.592 \\ + \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label smoothing\end{tabular} & 0.821 & 7.944 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.7697 & 6.341 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.751 & 6.16 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch Normalization\end{tabular} & 0.787 & 6.28 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch Normalization\\ One-sided label smoothing\end{tabular} & 0.829 & 6.62 \\ + *MNIST original test set & 0.9846 & 9.685 + \end{tabular} + \end{table} # Re-training the handwritten digit classifier -- cgit v1.2.3-54-g00ecf From 3adb475617e8dd8e53335e834083e6c5348110a5 Mon Sep 17 00:00:00 2001 From: nunzip Date: Fri, 8 Mar 2019 19:21:44 +0000 Subject: Update table --- report/paper.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 54f25db..984debf 100644 --- a/report/paper.md +++ b/report/paper.md @@ -151,21 +151,21 @@ with L2-Net logits. $$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$ \begin{table}[] -\begin{tabular}{lll} - & \begin{tabular}[c]{@{}l@{}}Test Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception Score \\ (L2-Net)\end{tabular} \\ \hline - Shallow CGAN & 0.7031 & 5.8 \\ - Medium CGAN & 0.7837 & 6.09 \\ - Deep CGAN & 0.8038 & 6.347 \\ - Convolutional CGAN & 0.7714 & 6.219 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label smoothing\end{tabular} & 0.8268 & 6.592 \\ - \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label smoothing\end{tabular} & 0.821 & 7.944 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.7697 & 6.341 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.751 & 6.16 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch Normalization\end{tabular} & 0.787 & 6.28 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch Normalization\\ One-sided label smoothing\end{tabular} & 0.829 & 6.62 \\ - *MNIST original test set & 0.9846 & 9.685 - \end{tabular} - \end{table} +\begin{tabular}{llll} + & \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline + Shallow CGAN & 0.645 & 3.57 & 8:14 \\ + Medium CGAN & 0.715 & 3.79 & 10:23 \\ + Deep CGAN & 0.739 & 3.85 & 16:27 \\ + Convolutional CGAN & 0.737 & 4 & 25:27 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ + \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ + *MNIST original & 0.9846 & 9.685 & N/A + \end{tabular} + \end{table} # Re-training the handwritten digit classifier -- cgit v1.2.3-54-g00ecf From 434679320585d08733246dc83eb7844d9b386d90 Mon Sep 17 00:00:00 2001 From: nunzip Date: Sun, 10 Mar 2019 13:11:03 +0000 Subject: Write part 4 and add figures --- report/fig/added_generated_data.png | Bin 0 -> 21511 bytes report/fig/fake_only.png | Bin 0 -> 14446 bytes report/fig/fine_tuning.png | Bin 0 -> 17374 bytes report/fig/initialization.png | Bin 0 -> 18564 bytes report/fig/retrain_fail.png | Bin 0 -> 12925 bytes report/fig/train_few_real.png | Bin 0 -> 16790 bytes report/fig/training_mixed.png | Bin 0 -> 15373 bytes report/paper.md | 75 +++++++++++++++++++++++++++--------- 8 files changed, 56 insertions(+), 19 deletions(-) create mode 100644 report/fig/added_generated_data.png create mode 100644 report/fig/fake_only.png create mode 100644 report/fig/fine_tuning.png create mode 100644 report/fig/initialization.png create mode 100644 report/fig/retrain_fail.png create mode 100644 report/fig/train_few_real.png create mode 100644 report/fig/training_mixed.png (limited to 'report/paper.md') diff --git a/report/fig/added_generated_data.png b/report/fig/added_generated_data.png new file mode 100644 index 0000000..37c3e1e Binary files /dev/null and b/report/fig/added_generated_data.png differ diff --git a/report/fig/fake_only.png b/report/fig/fake_only.png new file mode 100644 index 0000000..27ceba1 Binary files /dev/null and b/report/fig/fake_only.png differ diff --git a/report/fig/fine_tuning.png b/report/fig/fine_tuning.png new file mode 100644 index 0000000..98caa69 Binary files /dev/null and b/report/fig/fine_tuning.png differ diff --git a/report/fig/initialization.png b/report/fig/initialization.png new file mode 100644 index 0000000..79b2f07 Binary files /dev/null and b/report/fig/initialization.png differ diff --git a/report/fig/retrain_fail.png b/report/fig/retrain_fail.png new file mode 100644 index 0000000..2a71fd4 Binary files /dev/null and b/report/fig/retrain_fail.png differ diff --git a/report/fig/train_few_real.png b/report/fig/train_few_real.png new file mode 100644 index 0000000..5a1f940 Binary files /dev/null and b/report/fig/train_few_real.png differ diff --git a/report/fig/training_mixed.png b/report/fig/training_mixed.png new file mode 100644 index 0000000..868cbf1 Binary files /dev/null and b/report/fig/training_mixed.png differ diff --git a/report/paper.md b/report/paper.md index 984debf..81be991 100644 --- a/report/paper.md +++ b/report/paper.md @@ -176,40 +176,62 @@ of the real training and synthetic images, e.g. 10%, 20%, 50%, and 100%, of each \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/mix.png} -\caption{Mix training} +\includegraphics[width=12em]{fig/mix_zoom.png} +\includegraphics[width=12em]{fig/added_generated_data.png} +\caption{Mix data, left unchanged samples number, right added samples} \label{fig:mix1} \end{center} \end{figure} +## Adapted Training Strategy + +For this section we will use 550 samples from MNIST (55 samples per class). Training the classifier +yelds major challanges, since the amount of samples aailable for training is relatively small. + +Training for 100 epochs, similarly to the previous section, is clearly not enough. The MNIST test set accuracy reached in this case +is only 62%, while training for 300 epochs we can reach up to 88%. The learning curve in figure \ref{fig:few_real} suggests +we cannot achieve much better whith this very small amount of data, since the validation accuracy flattens, while the training accuracy +almost reaches 100%. + \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/mix_zoom.png} -\caption{Mix training zoom} -\label{fig:mix2} +\includegraphics[width=24em]{fig/train_few_real.png} +\caption{Training with few real samples} +\label{fig:few_real} \end{center} \end{figure} +We conduct one experiment, feeding the test set to a L2-Net trained exclusively on data generated from our CGAN. It is noticeable that training +for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained while training the network ith only the few real samples. This +indicates that we can use the generated data to train the first steps of the network (initial weights) and apply the real sample for 300 epochs to obtain +a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successfull at +improving testing accuracy. + \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/mix_scores.png} -\caption{Mix training scores} -\label{fig:mix3} +\includegraphics[width=24em]{fig/initialization.png} +\caption{Retraining with initialization from generated samples} +\label{fig:few_init} \end{center} \end{figure} -## Adapted Training Strategy -*Using even a small number of real samples per class would already give a high recognition rate, -which is difficult to improve. Use few real samples per class, and, plenty generated images in a -good quality and see if the testing accuracy can be improved or not, over the model trained using -the few real samples only. -Did you have to change the strategy in training the classification network in order to improve the -testing accuracy? For example, use synthetic data to initialise the network parameters followed -by fine tuning the parameters with real data set. Or using realistic synthetic data based on the -confidence score from the classification network pre-trained on real data. If yes, please then -specify your training strategy in details. -Analyse and discuss the outcome of the experimental result.* +We try to improve the results obtained earlier by retraining L2-Net with mixed data: few real samples and plenty of generated samples (160.000) +(learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated +samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is +boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. + +\begin{figure} +\begin{center} +\includegraphics[width=12em]{fig/training_mixed.png} +\includegraphics[width=12em]{fig/fine_tuning.png} +\caption{Retraining; Mixed initialization left, fine tuning right} +\label{fig:training_mixed} +\end{center} +\end{figure} + +Failures classification examples are displayed in figure \ref{fig:retrain_fail}. The results showed indicate that the network we trained is actually performing quite well, +as most of the testing images that got misclassified (mainly nines and fours) show ambiguities. # Bonus @@ -369,3 +391,18 @@ architecture and loss function? \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/fake_only.png} +\caption{Retraining with generated samples only} +\label{fig:fake_only} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=12em]{fig/retrain_fail.png} +\caption{Retraining failures} +\label{fig:retrain_fail} +\end{center} +\end{figure} -- cgit v1.2.3-54-g00ecf From 47e6ea316baeba86c6df12634ffbeab2a1da8b73 Mon Sep 17 00:00:00 2001 From: nunzip Date: Sun, 10 Mar 2019 13:23:02 +0000 Subject: Finish part 4 --- report/paper.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 81be991..d058051 100644 --- a/report/paper.md +++ b/report/paper.md @@ -171,8 +171,15 @@ $$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) ## Results -Retrain with different portions and test BOTH fake and real queries. Please **vary** the portions -of the real training and synthetic images, e.g. 10%, 20%, 50%, and 100%, of each. +In this section we analyze the effect of retraining the classification network using a mix of real and generated data, highlighting the benefits of +injecting generated samples in the original training set to boost testing accuracy. + +As observed in figure \ref{fig:mix1} we performed two experiments for performance evaluation: + +\begin{itemize} +\item Keeping the same number of training samples while just changing the amount of real to generated data (55.000 samples in total). +\item Keeping the whole training set from MNIST and adding generated samples from CGAN. +\end{itemize} \begin{figure} \begin{center} @@ -183,6 +190,9 @@ of the real training and synthetic images, e.g. 10%, 20%, 50%, and 100%, of each \end{center} \end{figure} +Both experiments show that an optimal amount of data to boost testing accuracy on the original MNIST dataset is around 30% generated data as in both cases we observe +an increase in accuracy by around 0.3%. In absence of original data the testing accuracy drops significantly to around 20% in both cases. + ## Adapted Training Strategy For this section we will use 550 samples from MNIST (55 samples per class). Training the classifier -- cgit v1.2.3-54-g00ecf From fbda0ec642721980cf5ee70dfb9ef9cdf2fdd26f Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 15:35:18 +0000 Subject: Improve first sections --- report/paper.md | 79 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 39 insertions(+), 40 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index d058051..53cdb3f 100644 --- a/report/paper.md +++ b/report/paper.md @@ -1,17 +1,25 @@ # Introduction -In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataaset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. +In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. ## GAN -Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and ideally features as the samples it has been trained with. +Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with. GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. -Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse and generating low quality images due to unbalanced G-D losses. +Training a shallow GAN with no convolutional layers poses problems such as mode collapse and unbalanced G-D losses which lead to low quality image output. -Mode collapse can be observed in figure \ref{fig:mode_collapse}, after 200.000 iterations of the GAN network presented in appendix, figure \ref{fig:vanilla_gan} . The output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops -improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tentding to zero as it learns ti classify the fake 1's, while the generator is stuck producing 1's. +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf} +\caption{Vanilla GAN mode collapse} +\label{fig:mode_collapse} +\end{center} +\end{figure} + + +Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanilla_gan}) implementation after 200,000 epochs. The generated images observed during a mode collapse can be seen on figure \ref{fig:mode_collapse}. The output of the generator only represents few of the labels originally fed. When mode collapse is reached loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tends to zero as the discriminator learns to assume and classify the fake 1's, while the generator is stuck producing 1 and hence not able to improve. A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). @@ -39,7 +47,7 @@ The main architecture used can be observed in figure \ref{fig:dcganarc}. ## Tests on MNIST -We propose 3 different architectures, varying the size of convolutional layers in the generator, while retaining the structure proposed in figure \ref{fig:dcganarc}: +We evaluate three different GAN architectures, varying the size of convolutional layers in the generator, while retaining the structure presented in figure \ref{fig:dcganarc}: \begin{itemize} \item Shallow: Conv128-Conv64 @@ -56,17 +64,13 @@ We propose 3 different architectures, varying the size of convolutional layers i \end{center} \end{figure} -It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance, -balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to -Deep DCGAN. +We observed that the deep architectures result in a more easily achievable equilibria of G-D losses. +Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5.000 epochs, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. -Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class. -Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This -specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison +As DCGAN is trained with no labels, the generator primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes form one another. Therefore we sometimes observe oddly shape fused digits which may temporarily full be labeled real by the discriminator. This issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. -Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it -is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. +Applying Virtual Batch Normalization our Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. \begin{figure} \begin{center} @@ -76,11 +80,10 @@ is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} s \end{center} \end{figure} -We evaluated the effect of different dropout rates (results in appendix, figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimization -of this parameter is essential to obtain good performance: a high dropout rate would result in DCGAN producing only artifacts that do not really match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate would lead to an initial stabilisation of G-D losses, but it would result into oscillation when training for a large number of epochs. +We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation +of the droupout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of epochs. -While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than -the simple GAN presented in the introduction. +While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*. # CGAN @@ -150,22 +153,26 @@ with L2-Net logits. $$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$ +``` \begin{table}[] \begin{tabular}{llll} - & \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline - Shallow CGAN & 0.645 & 3.57 & 8:14 \\ - Medium CGAN & 0.715 & 3.79 & 10:23 \\ - Deep CGAN & 0.739 & 3.85 & 16:27 \\ - Convolutional CGAN & 0.737 & 4 & 25:27 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ - \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ - *MNIST original & 0.9846 & 9.685 & N/A - \end{tabular} - \end{table} +& \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline + Shallow CGAN & 0.645 & 3.57 & 8:14 \\ + Medium CGAN & 0.715 & 3.79 & 10:23 \\ + Deep CGAN & 0.739 & 3.85 & 16:27 \\ + Convolutional CGAN & 0.737 & 4 & 25:27 \\ + + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ + \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ + \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ + *MNIST original & 0.9846 & 9.685 & N/A + +\end{tabular} +\end{table} +``` # Re-training the handwritten digit classifier @@ -293,14 +300,6 @@ architecture and loss function? \end{center} \end{figure} -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf} -\caption{Shallow GAN mode collapse} -\label{fig:mode_collapse} -\end{center} -\end{figure} - \begin{figure} \begin{center} \includegraphics[width=24em]{fig/short_dcgan_ex.pdf} -- cgit v1.2.3-54-g00ecf From da913f9a4dabab31698669b09b69a215d7947c4e Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 17:01:42 +0000 Subject: Add TSNE and fix PCA --- lenet.py | 31 ++++++++++++++++++++----------- report/paper.md | 8 ++++++-- 2 files changed, 26 insertions(+), 13 deletions(-) (limited to 'report/paper.md') diff --git a/lenet.py b/lenet.py index 3d388de..3d9ed20 100644 --- a/lenet.py +++ b/lenet.py @@ -16,6 +16,7 @@ from sklearn.model_selection import train_test_split from sklearn.decomposition import PCA from classifier_metrics_impl import classifier_score_from_logits from sklearn.utils import shuffle +from sklearn.manifold import TSNE def import_mnist(): from tensorflow.examples.tutorials.mnist import input_data @@ -141,12 +142,12 @@ def train_classifier(x_train, y_train, x_val, y_val, batch_size=128, epochs=100, model.save_weights('./weights.h5') return model -def test_classifier(model, x_test, y_true, conf_mat=False, pca=False): +def test_classifier(model, x_test, y_true, conf_mat=False, pca=False, tsne=False): x_test = np.pad(x_test, ((0,0),(2,2),(2,2),(0,0)), 'constant') - y_pred = model.predict(x_test) - logits = tf.convert_to_tensor(y_pred, dtype=tf.float32) - inception_score = tf.keras.backend.eval(classifier_score_from_logits(logits)) - y_pred = np.argmax(y_pred, axis=1) + logits = model.predict(x_test) + tf_logits = tf.convert_to_tensor(logits, dtype=tf.float32) + inception_score = tf.keras.backend.eval(classifier_score_from_logits(tf_logits)) + y_pred = np.argmax(logits, axis=1) y_true = np.argmax(y_true, axis=1) plot_example_errors(y_pred, y_true, x_test) cm = confusion_matrix(y_true, y_pred) @@ -158,16 +159,24 @@ def test_classifier(model, x_test, y_true, conf_mat=False, pca=False): plt.show() if pca: set_pca = PCA(n_components=2) - pca_rep = np.reshape(x_test, (x_test.shape[0], x_test.shape[1]*x_test.shape[2])) - print(pca_rep.shape) - pca_rep = set_pca.fit_transform(pca_rep) - print(pca_rep.shape) + pca_rep = set_pca.fit_transform(logits) pca_rep, y_tmp = shuffle(pca_rep, y_true, random_state=0) - plt.scatter(pca_rep[:100, 0], pca_rep[:100, 1], c=y_true[:100], edgecolor='none', alpha=0.5, cmap=plt.cm.get_cmap('Paired', 10)) + plt.scatter(pca_rep[:1000, 0], pca_rep[:1000, 1], c=y_true[:1000], edgecolor='none', alpha=0.5, cmap=plt.cm.get_cmap('Paired', 10)) plt.xlabel('component 1') plt.ylabel('component 2') plt.colorbar(); plt.show() + if tsne: + tsne = TSNE(n_components=2, random_state=0) + components = tsne.fit_transform(logits) + print(components.shape) + components, y_tmp = shuffle(components, y_true, random_state=0) + plt.scatter(components[:1000, 0], components[:1000, 1], c=y_true[:1000], edgecolor='none', alpha=0.5, cmap=plt.cm.get_cmap('Paired', 10)) + plt.xlabel('component 1') + plt.ylabel('component 2') + plt.colorbar(); + plt.show() + return accuracy_score(y_true, y_pred), inception_score @@ -202,4 +211,4 @@ if __name__ == '__main__': x_train, y_train, x_val, y_val, x_t, y_t = import_mnist() print(y_t.shape) model = train_classifier(x_train[:100], y_train[:100], x_val, y_val, epochs=3) - print(test_classifier(model, x_t, y_t, pca=True)) + print(test_classifier(model, x_t, y_t, pca=False, tsne=True)) diff --git a/report/paper.md b/report/paper.md index 53cdb3f..e053353 100644 --- a/report/paper.md +++ b/report/paper.md @@ -151,7 +151,7 @@ architectures in Q2.** We measure the performance of the considered GAN's using the Inecption score [-inception], as calculated with L2-Net logits. -$$ \textrm{IS}(x) = \exp(\mathcal{E}_x \left( \textrm{KL} ( p(y\|x) \|\| p(y) ) \right) ) $$ +$$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ ``` \begin{table}[] @@ -252,7 +252,11 @@ as most of the testing images that got misclassified (mainly nines and fours) sh # Bonus -This is an open question. Do you have any other ideas to improve GANs or +## Relation to PCA + +Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. While both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would ndeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. + +* This is an open question. Do you have any other ideas to improve GANs or have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance, \begin{itemize} -- cgit v1.2.3-54-g00ecf From 8590fd558a5957f193c60715de21433f0c0843a6 Mon Sep 17 00:00:00 2001 From: nunzip Date: Sun, 10 Mar 2019 17:02:07 +0000 Subject: grammar mistake correction --- report/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index d058051..35d8ba7 100644 --- a/report/paper.md +++ b/report/paper.md @@ -191,7 +191,7 @@ As observed in figure \ref{fig:mix1} we performed two experiments for performanc \end{figure} Both experiments show that an optimal amount of data to boost testing accuracy on the original MNIST dataset is around 30% generated data as in both cases we observe -an increase in accuracy by around 0.3%. In absence of original data the testing accuracy drops significantly to around 20% in both cases. +an increase in accuracy by around 0.3%. In absence of original data the testing accuracy drops significantly to around 20% for both cases. ## Adapted Training Strategy -- cgit v1.2.3-54-g00ecf From a1f5db1cd15800175eb0d20e8f044bab5724cb29 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 19:47:11 +0000 Subject: Add part 5 for report --- report/bibliography.bib | 7 +++++++ report/paper.md | 36 ++++++++++++++++++++++++++++++------ report/template.latex | 1 + 3 files changed, 38 insertions(+), 6 deletions(-) (limited to 'report/paper.md') diff --git a/report/bibliography.bib b/report/bibliography.bib index 8230369..0defd2d 100644 --- a/report/bibliography.bib +++ b/report/bibliography.bib @@ -1,3 +1,10 @@ +@misc{inception-note, +Author = {Shane Barratt and Rishi Sharma}, +Title = {A Note on the Inception Score}, +Year = {2018}, +Eprint = {arXiv:1801.01973}, +} + @inproceedings{km-complexity, author = {Inaba, Mary and Katoh, Naoki and Imai, Hiroshi}, title = {Applications of Weighted Voronoi Diagrams and Randomization to Variance-based K-clustering: (Extended Abstract)}, diff --git a/report/paper.md b/report/paper.md index fbb54f3..f3d73dc 100644 --- a/report/paper.md +++ b/report/paper.md @@ -254,7 +254,7 @@ as most of the testing images that got misclassified (mainly nines and fours) sh ## Relation to PCA -Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. While both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would ndeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. +Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. While both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would indeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. * This is an open question. Do you have any other ideas to improve GANs or have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance, @@ -273,13 +273,37 @@ synthetic data. Also plot the distribution of confidence scores on these real an sub-sampled examples by the classification network trained on 100% real data on two separate graphs. Explain the trends in the graphs. -\item Can we add a classification loss (using the pre-trained classifier) to CGAN, and see if this -improve? The classification loss would help the generated images maintain the class -labels, i.e. improving the inception score. What would be the respective network -architecture and loss function? - \end{itemize} +\begin{figure} + \centering + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-mnist.png}}\quad + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-mnist.png}}\\ + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cgan.png}}\quad + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}} + \caption{ROC and PR curves Top: MNIST, Bottom: CGAN output} + \label{fig:features} +\end{figure} + + +\begin{figure}[!ht] + \centering + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/roc-mnist.png}}\quad + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-mnist.png}}\\ + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/roc-cgan.png}}\quad + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-cgan.png}} + \caption{ROC and PR curves Top: MNIST, Bottom: CGAN output} + \label{fig:rocpr} +\end{figure} + +## Factoring in classification loss into GAN + +Classification accuracy and Inception score can be factored into the GAN to attemp to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. +Nevertheless, a pretrained static classifier may be added to the GAN model, and it's loss incorporated into the loss added too the loss of the gan. + +$$ L_{\textrm{total}} = \alpha L_{2-\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ + + # References
diff --git a/report/template.latex b/report/template.latex index afc8358..52adf9f 100644 --- a/report/template.latex +++ b/report/template.latex @@ -1,4 +1,5 @@ \documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$babel-lang$,$endif$$if(papersize)$$papersize$paper,$endif$$for(classoption)$$classoption$$sep$,$endfor$]{IEEEtran} +\usepackage[caption=false]{subfig} $if(beamerarticle)$ \usepackage{beamerarticle} % needs to be loaded first \usepackage[T1]{fontenc} -- cgit v1.2.3-54-g00ecf From 3b7847633545673117eff53f66f47db519ad6cf2 Mon Sep 17 00:00:00 2001 From: nunzip Date: Sun, 10 Mar 2019 19:51:34 +0000 Subject: Rewrite table --- report/paper.md | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index f3d73dc..3853717 100644 --- a/report/paper.md +++ b/report/paper.md @@ -153,26 +153,22 @@ with L2-Net logits. $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ -``` \begin{table}[] \begin{tabular}{llll} -& \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline - Shallow CGAN & 0.645 & 3.57 & 8:14 \\ - Medium CGAN & 0.715 & 3.79 & 10:23 \\ - Deep CGAN & 0.739 & 3.85 & 16:27 \\ - Convolutional CGAN & 0.737 & 4 & 25:27 \\ - - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ - \begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ - \begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ - *MNIST original & 0.9846 & 9.685 & N/A - + & \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline +Shallow CGAN & 0.645 & 3.57 & 8:14 \\ +Medium CGAN & 0.715 & 3.79 & 10:23 \\ +Deep CGAN & 0.739 & 3.85 & 16:27 \\ +Convolutional CGAN & 0.737 & 4 & 25:27 \\ +\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ +\begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ +\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ +\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ +\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ +\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ +*MNIST original & 0.9846 & 9.685 & N/A \end{tabular} \end{table} -``` # Re-training the handwritten digit classifier -- cgit v1.2.3-54-g00ecf From f0f101256c3c1394dd5b69998a0699cddfc0d9e6 Mon Sep 17 00:00:00 2001 From: nunzip Date: Sun, 10 Mar 2019 20:12:07 +0000 Subject: Fix table --- report/paper.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 3853717..cf479bf 100644 --- a/report/paper.md +++ b/report/paper.md @@ -155,18 +155,18 @@ $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \begin{table}[] \begin{tabular}{llll} - & \begin{tabular}[c]{@{}l@{}}Test \\ Accuracy \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Inception \\ Score \\ (L2-Net)\end{tabular} & \begin{tabular}[c]{@{}l@{}}Execution \\ time\\ (Training \\ GAN)\end{tabular} \\ \hline -Shallow CGAN & 0.645 & 3.57 & 8:14 \\ -Medium CGAN & 0.715 & 3.79 & 10:23 \\ -Deep CGAN & 0.739 & 3.85 & 16:27 \\ -Convolutional CGAN & 0.737 & 4 & 25:27 \\ -\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.749 & 3.643 & 10:42 \\ -\begin{tabular}[c]{@{}l@{}}Convolutional CGAN\\ One-sided label \\ smoothing\end{tabular} & 0.601 & 2.494 & 27:36 \\ -\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.1\end{tabular} & 0.761 & 3.836 & 10:36 \\ -\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Dropout 0.5\end{tabular} & 0.725 & 3.677 & 10:36 \\ -\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\end{tabular} & ? & ? & ? \\ -\begin{tabular}[c]{@{}l@{}}Medium CGAN\\ Virtual Batch \\ Normalization\\ One-sided label \\ smoothing\end{tabular} & ? & ? & ? \\ -*MNIST original & 0.9846 & 9.685 & N/A + & Accuracy & Inception Sc. & GAN Tr. Time \\ \hline +Shallow CGAN & 0.645 & 3.57 & 8:14 \\ +Medium CGAN & 0.715 & 3.79 & 10:23 \\ +Deep CGAN & 0.739 & 3.85 & 16:27 \\ +Convolutional CGAN & 0.737 & 4 & 25:27 \\ +Medium CGAN+LS & 0.749 & 3.643 & 10:42 \\ +Convolutional CGAN+LS & 0.601 & 2.494 & 27:36 \\ +Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\ +Medium CGAN DO=0.5 & 0.725 & 3.677 & 10:36 \\ +Medium CGAN+VBN & ? & ? & ? \\ +Medium CGAN+VBN+LS & ? & ? & ? \\ +*MNIST original & 0.9846 & 9.685 & N/A \end{tabular} \end{table} -- cgit v1.2.3-54-g00ecf From 852862440eba885b7fe274e52ae7f48f0cceb081 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 20:13:33 +0000 Subject: Change graph headings --- report/paper.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 3853717..134e9ba 100644 --- a/report/paper.md +++ b/report/paper.md @@ -277,18 +277,16 @@ separate graphs. Explain the trends in the graphs. \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-mnist.png}}\\ \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cgan.png}}\quad \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}} - \caption{ROC and PR curves Top: MNIST, Bottom: CGAN output} + \caption{2D feature visualisations PCA: a) MNIST b) CGAN ; TSNE a) MNIST b) CGAN} \label{fig:features} \end{figure} -\begin{figure}[!ht] +\begin{figure} \centering - \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/roc-mnist.png}}\quad - \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-mnist.png}}\\ - \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/roc-cgan.png}}\quad + \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-mnist.png}}\quad \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-cgan.png}} - \caption{ROC and PR curves Top: MNIST, Bottom: CGAN output} + \caption{Precisional Recall Curves a) MNIST, b) CGAN output} \label{fig:rocpr} \end{figure} -- cgit v1.2.3-54-g00ecf From 4ef1a0704aa8de8e31ee3ed01e0e7f5dce9daf92 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 20:16:23 +0000 Subject: Increase graph size a little bit --- report/paper.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index eb1fc09..7dfb96c 100644 --- a/report/paper.md +++ b/report/paper.md @@ -277,16 +277,16 @@ separate graphs. Explain the trends in the graphs. \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-mnist.png}}\\ \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cgan.png}}\quad \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}} - \caption{2D feature visualisations PCA: a) MNIST b) CGAN ; TSNE a) MNIST b) CGAN} + \caption{2D visualisations PCA: a) MNIST b) CGAN : TSNE a) MNIST b) CGAN} \label{fig:features} \end{figure} \begin{figure} \centering - \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-mnist.png}}\quad - \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pr-cgan.png}} - \caption{Precisional Recall Curves a) MNIST, b) CGAN output} + \subfloat[][]{\includegraphics[width=.22\textwidth]{fig/pr-mnist.png}}\quad + \subfloat[][]{\includegraphics[width=.22\textwidth]{fig/pr-cgan.png}} + \caption{Precisional Recall Curves a) MNIST : b) CGAN output} \label{fig:rocpr} \end{figure} -- cgit v1.2.3-54-g00ecf From a9a15ae96684f60ac54f3bcf142349c0b60f10a3 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 20:21:33 +0000 Subject: Correct headings --- report/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 7dfb96c..a40a1e6 100644 --- a/report/paper.md +++ b/report/paper.md @@ -277,7 +277,7 @@ separate graphs. Explain the trends in the graphs. \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-mnist.png}}\\ \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cgan.png}}\quad \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}} - \caption{2D visualisations PCA: a) MNIST b) CGAN : TSNE a) MNIST b) CGAN} + \caption{Visualisations PCA: a) MNIST c) CGAN | TSNE b) MNIST d) CGAN} \label{fig:features} \end{figure} -- cgit v1.2.3-54-g00ecf From 8ea801529a17f9ba66ddfaa4299c0120a6ac36a3 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Sun, 10 Mar 2019 20:52:17 +0000 Subject: Write intro to CGAN --- report/paper.md | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index a40a1e6..dc9f95a 100644 --- a/report/paper.md +++ b/report/paper.md @@ -31,8 +31,7 @@ DCGAN exploits convolutional stride to perform downsampling and transposed convo We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). -The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in -the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25. +The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25. The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`. The main architecture used can be observed in figure \ref{fig:dcganarc}. @@ -49,11 +48,9 @@ The main architecture used can be observed in figure \ref{fig:dcganarc}. We evaluate three different GAN architectures, varying the size of convolutional layers in the generator, while retaining the structure presented in figure \ref{fig:dcganarc}: -\begin{itemize} -\item Shallow: Conv128-Conv64 -\item Medium: Conv256-Conv128 -\item Deep: Conv512-Conv256 -\end{itemize} +* Shallow: Conv128-Conv64 +* Medium: Conv256-Conv128 +* Deep: Conv512-Conv256 \begin{figure} \begin{center} @@ -89,6 +86,17 @@ While training the different proposed DCGAN architectures, we did not observe mo ## CGAN Architecture description +CGAN is a conditional version foa Generative adversarial network which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allows CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline GAN arhitecture presents a series blocks each contained a dense layer, ReLu layer and a Batch Normalisation layer. The baseline discriminator use Dense layers, followed by ReLu and a Droupout layer. + +We evaluate permutations of the architecture involving: + +* Shallow CGAN +* Deep CGAN +* Deep Convolutional GAN +* Label Smoothing (One Sided) +* Various Dropout +* Virtual Batch Normalisation + \begin{figure} \begin{center} \includegraphics[width=24em]{fig/CGAN_arch.pdf} @@ -140,7 +148,7 @@ The effect of dropout for the non-convolutional CGAN architecture does not affec ## Results Measure the inception scores i.e. we use the class labels to -generate images in CGAN and compare them with the predicted labels of the generated images. +generate images in CGAN and compare them with the predicted labels of the generated images. Also report the recognition accuracies on the MNIST real testing set (10K), in comparison to the inception scores. @@ -179,10 +187,8 @@ injecting generated samples in the original training set to boost testing accura As observed in figure \ref{fig:mix1} we performed two experiments for performance evaluation: -\begin{itemize} -\item Keeping the same number of training samples while just changing the amount of real to generated data (55.000 samples in total). -\item Keeping the whole training set from MNIST and adding generated samples from CGAN. -\end{itemize} +* Keeping the same number of training samples while just changing the amount of real to generated data (55.000 samples in total). +* Keeping the whole training set from MNIST and adding generated samples from CGAN. \begin{figure} \begin{center} -- cgit v1.2.3-54-g00ecf From f602d1d2488ad249bafab18b0d55f6a5436f32f4 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 14:47:24 +0000 Subject: Add some references and descriptoins --- report/bibliography.bib | 7 +++++++ report/paper.md | 38 +++++++++++++++----------------------- 2 files changed, 22 insertions(+), 23 deletions(-) (limited to 'report/paper.md') diff --git a/report/bibliography.bib b/report/bibliography.bib index 0defd2d..3ccece5 100644 --- a/report/bibliography.bib +++ b/report/bibliography.bib @@ -1,3 +1,10 @@ +@misc{improved, +Author = {Tim Salimans and Ian Goodfellow and Wojciech Zaremba and Vicki Cheung and Alec Radford and Xi Chen}, +Title = {Improved Techniques for Training GANs}, +Year = {2016}, +Eprint = {arXiv:1606.03498}, +} + @misc{inception-note, Author = {Shane Barratt and Rishi Sharma}, Title = {A Note on the Inception Score}, diff --git a/report/paper.md b/report/paper.md index dc9f95a..34e9b6a 100644 --- a/report/paper.md +++ b/report/paper.md @@ -86,16 +86,16 @@ While training the different proposed DCGAN architectures, we did not observe mo ## CGAN Architecture description -CGAN is a conditional version foa Generative adversarial network which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allows CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline GAN arhitecture presents a series blocks each contained a dense layer, ReLu layer and a Batch Normalisation layer. The baseline discriminator use Dense layers, followed by ReLu and a Droupout layer. +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline GAN arhitecture presents a series blocks each contained a dense layer, ReLu layer and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by ReLu and a Droupout layer. We evaluate permutations of the architecture involving: -* Shallow CGAN -* Deep CGAN -* Deep Convolutional GAN -* Label Smoothing (One Sided) -* Various Dropout -* Virtual Batch Normalisation +* Shallow CGAN - 1 Dense-ReLu-BN block +* Deep CGAN - 5 Dense-ReLu-BN +* Deep Convolutional GAN - DCGAN + conditional label input +* Label Smoothing (One Sided) - Truth labels to 0 and $1-\alpha$ (0.9) +* Various Dropout - Use 0.1 and 0.5 Dropout parameters +* Virtual Batch Normalisation - Normalisation based on one batch [@improved] \begin{figure} \begin{center} @@ -143,24 +143,13 @@ The effect of dropout for the non-convolutional CGAN architecture does not affec # Inception Score -## Classifier Architecture Used - -## Results - -Measure the inception scores i.e. we use the class labels to -generate images in CGAN and compare them with the predicted labels of the generated images. - -Also report the recognition accuracies on the -MNIST real testing set (10K), in comparison to the inception scores. - -**Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or -architectures in Q2.** - -We measure the performance of the considered GAN's using the Inecption score [-inception], as calculated -with L2-Net logits. +Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet as the basis of the inceptioen score. +Inception score is calculated with the logits of the LeNet $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ +## Classifier Architecture Used + \begin{table}[] \begin{tabular}{llll} & Accuracy & Inception Sc. & GAN Tr. Time \\ \hline @@ -174,10 +163,13 @@ Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\ Medium CGAN DO=0.5 & 0.725 & 3.677 & 10:36 \\ Medium CGAN+VBN & ? & ? & ? \\ Medium CGAN+VBN+LS & ? & ? & ? \\ -*MNIST original & 0.9846 & 9.685 & N/A +*MNIST original & 0.9846 & 9.685 & N/A \\ \hline \end{tabular} \end{table} + +**Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or + # Re-training the handwritten digit classifier ## Results -- cgit v1.2.3-54-g00ecf From 20d397edb973314ba327230f2a403ee495f38335 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 15:11:45 +0000 Subject: Add data for VBN --- report/paper.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 34e9b6a..eaaed12 100644 --- a/report/paper.md +++ b/report/paper.md @@ -161,8 +161,8 @@ Medium CGAN+LS & 0.749 & 3.643 & 10:42 \\ Convolutional CGAN+LS & 0.601 & 2.494 & 27:36 \\ Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\ Medium CGAN DO=0.5 & 0.725 & 3.677 & 10:36 \\ -Medium CGAN+VBN & ? & ? & ? \\ -Medium CGAN+VBN+LS & ? & ? & ? \\ +Medium CGAN+VBN & 0.745 & 4.02 & 10:38 \\ +Medium CGAN+VBN+LS & 0.783 & 4.31 & 10:38 \\ *MNIST original & 0.9846 & 9.685 & N/A \\ \hline \end{tabular} \end{table} -- cgit v1.2.3-54-g00ecf From 5dc4f974d373b6b7dc51c64da351872e907619bf Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 15:37:58 +0000 Subject: Move some comments --- report/paper.md | 50 +++++++++++++++++++++++++++++--------------------- 1 file changed, 29 insertions(+), 21 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index eaaed12..11d8c36 100644 --- a/report/paper.md +++ b/report/paper.md @@ -113,15 +113,7 @@ We evaluate permutations of the architecture involving: \end{center} \end{figure} - -## Tests on MNIST - -Try **different architectures, hyper-parameters**, and, if necessary, the aspects of **one-sided label -smoothing**, **virtual batch normalization**, balancing G and D. -Please perform qualitative analyses on the generated images, and discuss, with results, what -challenge and how they are specifically addressing. Is there the **mode collapse issue?** - -The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, as the images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. +## Tests on MNIST \begin{figure} \begin{center} @@ -132,23 +124,14 @@ The effect of dropout for the non-convolutional CGAN architecture does not affec \end{center} \end{figure} -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/smoothing_ex.pdf} -\includegraphics[width=24em]{fig/smoothing.png} -\caption{One sided label smoothing} -\label{fig:smooth} -\end{center} -\end{figure} - -# Inception Score +### Inception Score Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet as the basis of the inceptioen score. -Inception score is calculated with the logits of the LeNet +We use the logits extracted from LeNet: $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ -## Classifier Architecture Used +### Classifier Architecture Used \begin{table}[] \begin{tabular}{llll} @@ -167,9 +150,34 @@ Medium CGAN+VBN+LS & 0.783 & 4.31 & 10:38 \\ \end{tabular} \end{table} +## Discussion + +### Architecture + +### One Side Label Smoothing + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/smoothing_ex.pdf} +\includegraphics[width=24em]{fig/smoothing.png} +\caption{One sided label smoothing} +\label{fig:smooth} +\end{center} +\end{figure} + + + +### Virtual Batch Normalisation + + +### Dropout +The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, nor does it seem to affect the quality of images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. + **Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or + + # Re-training the handwritten digit classifier ## Results -- cgit v1.2.3-54-g00ecf From 8413e2f43543b36f5239e7c8477f9bbaed010022 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 15:55:48 +0000 Subject: s/l2net/lenet/ --- report/paper.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 11d8c36..fbc4eb3 100644 --- a/report/paper.md +++ b/report/paper.md @@ -220,7 +220,7 @@ almost reaches 100%. \end{center} \end{figure} -We conduct one experiment, feeding the test set to a L2-Net trained exclusively on data generated from our CGAN. It is noticeable that training +We conduct one experiment, feeding the test set to a LeNet trained exclusively on data generated from our CGAN. It is noticeable that training for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained while training the network ith only the few real samples. This indicates that we can use the generated data to train the first steps of the network (initial weights) and apply the real sample for 300 epochs to obtain a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successfull at @@ -235,7 +235,7 @@ improving testing accuracy. \end{figure} -We try to improve the results obtained earlier by retraining L2-Net with mixed data: few real samples and plenty of generated samples (160.000) +We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160.000) (learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. @@ -298,10 +298,10 @@ separate graphs. Explain the trends in the graphs. ## Factoring in classification loss into GAN -Classification accuracy and Inception score can be factored into the GAN to attemp to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. +Classification accuracy and Inception score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. Nevertheless, a pretrained static classifier may be added to the GAN model, and it's loss incorporated into the loss added too the loss of the gan. -$$ L_{\textrm{total}} = \alpha L_{2-\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ +$$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ # References -- cgit v1.2.3-54-g00ecf From ef97121a773ff7d5c47a8d6d68280c2bdd1e11c4 Mon Sep 17 00:00:00 2001 From: nunzip Date: Mon, 11 Mar 2019 17:16:47 +0000 Subject: Fix grammar mistakes --- report/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index fbc4eb3..6ec9f57 100644 --- a/report/paper.md +++ b/report/paper.md @@ -205,7 +205,7 @@ an increase in accuracy by around 0.3%. In absence of original data the testing ## Adapted Training Strategy For this section we will use 550 samples from MNIST (55 samples per class). Training the classifier -yelds major challanges, since the amount of samples aailable for training is relatively small. +yelds major challanges, since the amount of samples available for training is relatively small. Training for 100 epochs, similarly to the previous section, is clearly not enough. The MNIST test set accuracy reached in this case is only 62%, while training for 300 epochs we can reach up to 88%. The learning curve in figure \ref{fig:few_real} suggests -- cgit v1.2.3-54-g00ecf From 5a3c268b381ca63908e95c201f8049b22828856e Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 17:10:47 +0000 Subject: Write some description of the data --- report/paper.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 6ec9f57..e6894bb 100644 --- a/report/paper.md +++ b/report/paper.md @@ -154,6 +154,8 @@ Medium CGAN+VBN+LS & 0.783 & 4.31 & 10:38 \\ ### Architecture +We observe increased accruacy as we increase the depth of the arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation technques. + ### One Side Label Smoothing \begin{figure} @@ -165,17 +167,15 @@ Medium CGAN+VBN+LS & 0.783 & 4.31 & 10:38 \\ \end{center} \end{figure} - +One sided label smoothing involves relaxing our confidence on the labels in our data. This lowers the loss target to below 1. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy. ### Virtual Batch Normalisation +Virtual Batch Noramlisation is a further optimisation technique proposed by Tim Salimans et. al. [@improved]. Virtual batch normalisation is a modification to the batch normalisation layer, which performs normalisation based on statistics from a reference batch. We observe that VBN improved the classification accuracy and the Inception score. ### Dropout -The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, nor does it seem to affect the quality of images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. - - -**Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or +The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, nor does it seem to affect the quality of images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. # Re-training the handwritten digit classifier @@ -252,11 +252,11 @@ boosted to 92%, making this technique the most successfull attempt of improvemen Failures classification examples are displayed in figure \ref{fig:retrain_fail}. The results showed indicate that the network we trained is actually performing quite well, as most of the testing images that got misclassified (mainly nines and fours) show ambiguities. -# Bonus +# Bonus Questions ## Relation to PCA -Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. While both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would indeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. +Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. Both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), but PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would indeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. * This is an open question. Do you have any other ideas to improve GANs or have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance, -- cgit v1.2.3-54-g00ecf From 205e6d4d024090f12251b61371f0290487c2798e Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 17:34:13 +0000 Subject: A few spelling fixes --- report/paper.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index e6894bb..522eaed 100644 --- a/report/paper.md +++ b/report/paper.md @@ -199,18 +199,15 @@ As observed in figure \ref{fig:mix1} we performed two experiments for performanc \end{center} \end{figure} -Both experiments show that an optimal amount of data to boost testing accuracy on the original MNIST dataset is around 30% generated data as in both cases we observe -an increase in accuracy by around 0.3%. In absence of original data the testing accuracy drops significantly to around 20% for both cases. +Both experiments show that an optimal amount of data to boost testing accuracy on the original MNIST dataset is around 30% generated data as in both cases we observe an increase in accuracy by around 0.3%. In absence of original data the testing accuracy drops significantly to around 20% for both cases. ## Adapted Training Strategy -For this section we will use 550 samples from MNIST (55 samples per class). Training the classifier -yelds major challanges, since the amount of samples available for training is relatively small. +For this section we will use 550 samples from MNIST (55 samples per class). Training the classifier yields major challenges, since the amount of samples available for training is relatively small. Training for 100 epochs, similarly to the previous section, is clearly not enough. The MNIST test set accuracy reached in this case is only 62%, while training for 300 epochs we can reach up to 88%. The learning curve in figure \ref{fig:few_real} suggests -we cannot achieve much better whith this very small amount of data, since the validation accuracy flattens, while the training accuracy -almost reaches 100%. +we cannot achieve much better with this very small amount of data, since the validation accuracy plateaus, while the training accuracy almost reaches 100%. \begin{figure} \begin{center} @@ -221,7 +218,7 @@ almost reaches 100%. \end{figure} We conduct one experiment, feeding the test set to a LeNet trained exclusively on data generated from our CGAN. It is noticeable that training -for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained while training the network ith only the few real samples. This +for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained when training the network with only the few real samples. This indicates that we can use the generated data to train the first steps of the network (initial weights) and apply the real sample for 300 epochs to obtain a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successfull at improving testing accuracy. -- cgit v1.2.3-54-g00ecf From 9eeba3d77e0dbd1610213c2857bc32fb3187db28 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 11 Mar 2019 17:40:58 +0000 Subject: Add lenet bibtex --- report/bibliography.bib | 8 ++++++++ report/paper.md | 4 +++- 2 files changed, 11 insertions(+), 1 deletion(-) (limited to 'report/paper.md') diff --git a/report/bibliography.bib b/report/bibliography.bib index 3ccece5..430d8b5 100644 --- a/report/bibliography.bib +++ b/report/bibliography.bib @@ -1,3 +1,11 @@ +@INPROCEEDINGS{lenet, + author = {Yann Lecun and Léon Bottou and Yoshua Bengio and Patrick Haffner}, + title = {Gradient-based learning applied to document recognition}, + booktitle = {Proceedings of the IEEE}, + year = {1998}, + pages = {2278--2324} +} + @misc{improved, Author = {Tim Salimans and Ian Goodfellow and Wojciech Zaremba and Vicki Cheung and Alec Radford and Xi Chen}, Title = {Improved Techniques for Training GANs}, diff --git a/report/paper.md b/report/paper.md index 522eaed..1989472 100644 --- a/report/paper.md +++ b/report/paper.md @@ -126,11 +126,13 @@ We evaluate permutations of the architecture involving: ### Inception Score -Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet as the basis of the inceptioen score. +Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the inceptioen score. We use the logits extracted from LeNet: $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ +We further report the classification accuracy as found with LeNet. + ### Classifier Architecture Used \begin{table}[] -- cgit v1.2.3-54-g00ecf From 06310df0389921e20ce194370344e4f12828049b Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 13:04:41 +0000 Subject: Fix epoch/batch --- report/fig/cgan_dropout01.png | Bin 19085 -> 18171 bytes report/fig/cgan_dropout05.png | Bin 20612 -> 19569 bytes report/fig/dcgan_dropout01_gd.png | Bin 21547 -> 20802 bytes report/fig/dcgan_dropout05_gd.png | Bin 25444 -> 24782 bytes report/fig/generic_gan_loss.png | Bin 32275 -> 28806 bytes report/fig/long_cgan.png | Bin 23641 -> 22301 bytes report/fig/long_cgan_ex.png | Bin 0 -> 90457 bytes report/fig/long_dcgan.png | Bin 18557 -> 17753 bytes report/fig/long_dcgan_ex.png | Bin 0 -> 142641 bytes report/fig/med_cgan.png | Bin 19123 -> 18352 bytes report/fig/med_cgan_ex.png | Bin 0 -> 84936 bytes report/fig/med_dcgan.png | Bin 18041 -> 17503 bytes report/fig/med_dcgan_ex.png | Bin 0 -> 186851 bytes report/fig/short_cgan.png | Bin 26839 -> 24681 bytes report/fig/short_cgan_ex.png | Bin 0 -> 79789 bytes report/fig/short_dcgan.png | Bin 22431 -> 20998 bytes report/fig/short_dcgan_ex.png | Bin 0 -> 158578 bytes report/fig/smoothing.png | Bin 18734 -> 17544 bytes report/fig/smoothing_ex.png | Bin 0 -> 96210 bytes report/paper.md | 22 +++++++++++----------- 20 files changed, 11 insertions(+), 11 deletions(-) create mode 100644 report/fig/long_cgan_ex.png create mode 100644 report/fig/long_dcgan_ex.png create mode 100644 report/fig/med_cgan_ex.png create mode 100644 report/fig/med_dcgan_ex.png create mode 100644 report/fig/short_cgan_ex.png create mode 100644 report/fig/short_dcgan_ex.png create mode 100644 report/fig/smoothing_ex.png (limited to 'report/paper.md') diff --git a/report/fig/cgan_dropout01.png b/report/fig/cgan_dropout01.png index 450deaf..4c97618 100644 Binary files a/report/fig/cgan_dropout01.png and b/report/fig/cgan_dropout01.png differ diff --git a/report/fig/cgan_dropout05.png b/report/fig/cgan_dropout05.png index 0fe282f..a0baff0 100644 Binary files a/report/fig/cgan_dropout05.png and b/report/fig/cgan_dropout05.png differ diff --git a/report/fig/dcgan_dropout01_gd.png b/report/fig/dcgan_dropout01_gd.png index 37914ff..d20f9bf 100644 Binary files a/report/fig/dcgan_dropout01_gd.png and b/report/fig/dcgan_dropout01_gd.png differ diff --git a/report/fig/dcgan_dropout05_gd.png b/report/fig/dcgan_dropout05_gd.png index d15ced2..29137b8 100644 Binary files a/report/fig/dcgan_dropout05_gd.png and b/report/fig/dcgan_dropout05_gd.png differ diff --git a/report/fig/generic_gan_loss.png b/report/fig/generic_gan_loss.png index 701b191..42716dd 100644 Binary files a/report/fig/generic_gan_loss.png and b/report/fig/generic_gan_loss.png differ diff --git a/report/fig/long_cgan.png b/report/fig/long_cgan.png index 6b80387..55ce4f8 100644 Binary files a/report/fig/long_cgan.png and b/report/fig/long_cgan.png differ diff --git a/report/fig/long_cgan_ex.png b/report/fig/long_cgan_ex.png new file mode 100644 index 0000000..053d06c Binary files /dev/null and b/report/fig/long_cgan_ex.png differ diff --git a/report/fig/long_dcgan.png b/report/fig/long_dcgan.png index 4e12495..c0cbdf9 100644 Binary files a/report/fig/long_dcgan.png and b/report/fig/long_dcgan.png differ diff --git a/report/fig/long_dcgan_ex.png b/report/fig/long_dcgan_ex.png new file mode 100644 index 0000000..2bac124 Binary files /dev/null and b/report/fig/long_dcgan_ex.png differ diff --git a/report/fig/med_cgan.png b/report/fig/med_cgan.png index b42bf7b..f7981be 100644 Binary files a/report/fig/med_cgan.png and b/report/fig/med_cgan.png differ diff --git a/report/fig/med_cgan_ex.png b/report/fig/med_cgan_ex.png new file mode 100644 index 0000000..120ad57 Binary files /dev/null and b/report/fig/med_cgan_ex.png differ diff --git a/report/fig/med_dcgan.png b/report/fig/med_dcgan.png index 9a809c9..790608b 100644 Binary files a/report/fig/med_dcgan.png and b/report/fig/med_dcgan.png differ diff --git a/report/fig/med_dcgan_ex.png b/report/fig/med_dcgan_ex.png new file mode 100644 index 0000000..9d7af5d Binary files /dev/null and b/report/fig/med_dcgan_ex.png differ diff --git a/report/fig/short_cgan.png b/report/fig/short_cgan.png index 2ddb5cd..4ff9c90 100644 Binary files a/report/fig/short_cgan.png and b/report/fig/short_cgan.png differ diff --git a/report/fig/short_cgan_ex.png b/report/fig/short_cgan_ex.png new file mode 100644 index 0000000..5097d80 Binary files /dev/null and b/report/fig/short_cgan_ex.png differ diff --git a/report/fig/short_dcgan.png b/report/fig/short_dcgan.png index ea8199b..d7c3326 100644 Binary files a/report/fig/short_dcgan.png and b/report/fig/short_dcgan.png differ diff --git a/report/fig/short_dcgan_ex.png b/report/fig/short_dcgan_ex.png new file mode 100644 index 0000000..56a2462 Binary files /dev/null and b/report/fig/short_dcgan_ex.png differ diff --git a/report/fig/smoothing.png b/report/fig/smoothing.png index 86de8e8..3e09cf6 100644 Binary files a/report/fig/smoothing.png and b/report/fig/smoothing.png differ diff --git a/report/fig/smoothing_ex.png b/report/fig/smoothing_ex.png new file mode 100644 index 0000000..6bddcbc Binary files /dev/null and b/report/fig/smoothing_ex.png differ diff --git a/report/paper.md b/report/paper.md index 1989472..364e6a5 100644 --- a/report/paper.md +++ b/report/paper.md @@ -19,7 +19,7 @@ Training a shallow GAN with no convolutional layers poses problems such as mode \end{figure} -Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanilla_gan}) implementation after 200,000 epochs. The generated images observed during a mode collapse can be seen on figure \ref{fig:mode_collapse}. The output of the generator only represents few of the labels originally fed. When mode collapse is reached loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tends to zero as the discriminator learns to assume and classify the fake 1's, while the generator is stuck producing 1 and hence not able to improve. +Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanilla_gan}) implementation after 200,000 batches. The generated images observed during a mode collapse can be seen on figure \ref{fig:mode_collapse}. The output of the generator only represents few of the labels originally fed. When mode collapse is reached loss function of the generator stops improving as shown in figure \ref{fig:vanilla_loss}. We observe, the discriminator loss tends to zero as the discriminator learns to assume and classify the fake 1's, while the generator is stuck producing 1 and hence not able to improve. A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). @@ -54,7 +54,7 @@ We evaluate three different GAN architectures, varying the size of convolutional \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/med_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/med_dcgan_ex.png} \includegraphics[width=24em]{fig/med_dcgan.png} \caption{Medium DCGAN} \label{fig:dcmed} @@ -62,9 +62,9 @@ We evaluate three different GAN architectures, varying the size of convolutional \end{figure} We observed that the deep architectures result in a more easily achievable equilibria of G-D losses. -Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5.000 epochs, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. +Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5.000 batches, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. -As DCGAN is trained with no labels, the generator primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes form one another. Therefore we sometimes observe oddly shape fused digits which may temporarily full be labeled real by the discriminator. This issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison +As DCGAN is trained with no labels, the generator primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes form one another. Therefore we sometimes observe oddly shape fused digits which may temporarily full be labeled real by the discriminator. This issue is solved by training the network for more batches or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. Applying Virtual Batch Normalization our Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it is difficult to qualitatively assess the improvements, figure \ref{fig:vbn_dc} shows results of the introduction of this technique. @@ -78,7 +78,7 @@ Applying Virtual Batch Normalization our Medium DCGAN does not provide observabl \end{figure} We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation -of the droupout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of epochs. +of the droupout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*. @@ -117,7 +117,7 @@ We evaluate permutations of the architecture involving: \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/med_cgan_ex.pdf} +\includegraphics[width=24em]{fig/med_cgan_ex.png} \includegraphics[width=24em]{fig/med_cgan.png} \caption{Medium CGAN} \label{fig:cmed} @@ -162,7 +162,7 @@ We observe increased accruacy as we increase the depth of the arhitecture at the \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/smoothing_ex.pdf} +\includegraphics[width=24em]{fig/smoothing_ex.png} \includegraphics[width=24em]{fig/smoothing.png} \caption{One sided label smoothing} \label{fig:smooth} @@ -329,7 +329,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/short_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/short_dcgan_ex.png} \includegraphics[width=24em]{fig/short_dcgan.png} \caption{Shallow DCGAN} \label{fig:dcshort} @@ -338,7 +338,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/long_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/long_dcgan_ex.png} \includegraphics[width=24em]{fig/long_dcgan.png} \caption{Deep DCGAN} \label{fig:dclong} @@ -379,7 +379,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/short_cgan_ex.pdf} +\includegraphics[width=24em]{fig/short_cgan_ex.png} \includegraphics[width=24em]{fig/short_cgan.png} \caption{Shallow CGAN} \label{fig:cshort} @@ -388,7 +388,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/long_cgan_ex.pdf} +\includegraphics[width=24em]{fig/long_cgan_ex.png} \includegraphics[width=24em]{fig/long_cgan.png} \caption{Deep CGAN} \label{fig:clong} -- cgit v1.2.3-54-g00ecf From de8c86ffee5cae9667de94530157d3ffef879ce3 Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 13:40:30 +0000 Subject: Add more details to CGAN --- report/paper.md | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 364e6a5..03fad67 100644 --- a/report/paper.md +++ b/report/paper.md @@ -86,7 +86,9 @@ While training the different proposed DCGAN architectures, we did not observe mo ## CGAN Architecture description -CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline GAN arhitecture presents a series blocks each contained a dense layer, ReLu layer and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by ReLu and a Droupout layer. +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu and a Droupout layer. + +The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{}. We evaluate permutations of the architecture involving: @@ -115,6 +117,22 @@ We evaluate permutations of the architecture involving: ## Tests on MNIST +When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In +a shallow architecture we notice a high oscillation of the generator loss \ref{fig:}, which is being overpowered by the discriminator. Despite this we don't +experience any issues with vanishing gradient, hence no mode collapse is reached. +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:}. + +The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{}, \ref{} and \ref{}, both +image quality and G-D losses are comparable. + +The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to incentivize the discriminator. +Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better. Performance results for +one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{}. + +ADD FORMULA? + +ADD VBN TALKING ABOUT TIME AND RESULTS + \begin{figure} \begin{center} \includegraphics[width=24em]{fig/med_cgan_ex.png} @@ -124,7 +142,7 @@ We evaluate permutations of the architecture involving: \end{center} \end{figure} -### Inception Score +# Inception Score Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the inceptioen score. We use the logits extracted from LeNet: @@ -146,8 +164,8 @@ Medium CGAN+LS & 0.749 & 3.643 & 10:42 \\ Convolutional CGAN+LS & 0.601 & 2.494 & 27:36 \\ Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\ Medium CGAN DO=0.5 & 0.725 & 3.677 & 10:36 \\ -Medium CGAN+VBN & 0.745 & 4.02 & 10:38 \\ -Medium CGAN+VBN+LS & 0.783 & 4.31 & 10:38 \\ +Medium CGAN+VBN & 0.745 & 4.02 & 19:38 \\ +Medium CGAN+VBN+LS & 0.783 & 4.31 & 19:43 \\ *MNIST original & 0.9846 & 9.685 & N/A \\ \hline \end{tabular} \end{table} @@ -156,7 +174,7 @@ Medium CGAN+VBN+LS & 0.783 & 4.31 & 10:38 \\ ### Architecture -We observe increased accruacy as we increase the depth of the arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation technques. +We observe increased accruacy as we increase the depth of the arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. ### One Side Label Smoothing -- cgit v1.2.3-54-g00ecf From eaa279aa9a2732e967f503ecca6008a4e12329cf Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 15:14:12 +0000 Subject: Write more about CGAN and add figures --- report/fig/bad_ex.png | Bin 0 -> 15772 bytes report/fig/cdcgan.png | Bin 0 -> 26406 bytes report/fig/good_ex.png | Bin 0 -> 14206 bytes report/paper.md | 90 ++++++++++++++++++++++++++----------------------- 4 files changed, 48 insertions(+), 42 deletions(-) create mode 100644 report/fig/bad_ex.png create mode 100644 report/fig/cdcgan.png create mode 100644 report/fig/good_ex.png (limited to 'report/paper.md') diff --git a/report/fig/bad_ex.png b/report/fig/bad_ex.png new file mode 100644 index 0000000..bdc899e Binary files /dev/null and b/report/fig/bad_ex.png differ diff --git a/report/fig/cdcgan.png b/report/fig/cdcgan.png new file mode 100644 index 0000000..179e9a4 Binary files /dev/null and b/report/fig/cdcgan.png differ diff --git a/report/fig/good_ex.png b/report/fig/good_ex.png new file mode 100644 index 0000000..43bb567 Binary files /dev/null and b/report/fig/good_ex.png differ diff --git a/report/paper.md b/report/paper.md index 03fad67..2a14059 100644 --- a/report/paper.md +++ b/report/paper.md @@ -86,18 +86,19 @@ While training the different proposed DCGAN architectures, we did not observe mo ## CGAN Architecture description -CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu and a Droupout layer. +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer (slope=0.2) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu (slope=0.2) and a Droupout layer. +The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`). -The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{}. +The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{fig:cdcganarc}. We evaluate permutations of the architecture involving: -* Shallow CGAN - 1 Dense-ReLu-BN block -* Deep CGAN - 5 Dense-ReLu-BN +* Shallow CGAN - 1 Dense-LeakyReLu-BN block +* Deep CGAN - 5 Dense-LeakyReLu-BN * Deep Convolutional GAN - DCGAN + conditional label input -* Label Smoothing (One Sided) - Truth labels to 0 and $1-\alpha$ (0.9) -* Various Dropout - Use 0.1 and 0.5 Dropout parameters -* Virtual Batch Normalisation - Normalisation based on one batch [@improved] +* One-Sided Label Smoothing (LS) +* Various Dropout (DO)- Use 0.1 and 0.5 Dropout parameters +* Virtual Batch Normalisation (VBN)- Normalisation based on one batch [@improved] \begin{figure} \begin{center} @@ -107,31 +108,33 @@ We evaluate permutations of the architecture involving: \end{center} \end{figure} -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/CDCGAN_arch.pdf} -\caption{Deep Convolutional CGAN Architecture} -\label{fig:cdcganarc} -\end{center} -\end{figure} - ## Tests on MNIST When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In -a shallow architecture we notice a high oscillation of the generator loss \ref{fig:}, which is being overpowered by the discriminator. Despite this we don't +a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:}. +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:clong}. +The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, +and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. +The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. -The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{}, \ref{} and \ref{}, both +The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{fig:cg_drop1_1} (0.1), \ref{fig:cmed}(0.3) and \ref{fig:cg_drop2_1}(0.5), both image quality and G-D losses are comparable. The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to incentivize the discriminator. Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better. Performance results for -one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{}. +one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}. + +Virtual Batch normalization does not affect performance significantly. Applying this technique to both the CGAN architectures used keeps G-D losses +mostly unchanged. The biggest change we expect to see is a lower correlation between images in the same batch. This aspect will mostly affect +performance when training a classifier with the generated images from CGAN, as we will obtain more diverse images. Training with a larger batch size +would show more significant results, but since we set this parameter to 128 the issue of within-batch correlation is limited. -ADD FORMULA? +Convolutional CGAN did not achieve better results than our baseline approach for the architecture analyzed, although we believe that +it is possible to achieve a better performance by finer tuning of the Convolutional CGAN parameters. Figure \ref{fig:cdcloss} shows a very high oscillation +of the generator loss, hence the image quality varies a lot at each training step. Attempting LS on this architecture achieved a similar outcome +when compared to the non-convolutional counterpart. -ADD VBN TALKING ABOUT TIME AND RESULTS \begin{figure} \begin{center} @@ -155,7 +158,7 @@ We further report the classification accuracy as found with LeNet. \begin{table}[] \begin{tabular}{llll} - & Accuracy & Inception Sc. & GAN Tr. Time \\ \hline + & Accuracy & IS & GAN Tr. Time \\ \hline Shallow CGAN & 0.645 & 3.57 & 8:14 \\ Medium CGAN & 0.715 & 3.79 & 10:23 \\ Deep CGAN & 0.739 & 3.85 & 16:27 \\ @@ -164,8 +167,8 @@ Medium CGAN+LS & 0.749 & 3.643 & 10:42 \\ Convolutional CGAN+LS & 0.601 & 2.494 & 27:36 \\ Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\ Medium CGAN DO=0.5 & 0.725 & 3.677 & 10:36 \\ -Medium CGAN+VBN & 0.745 & 4.02 & 19:38 \\ -Medium CGAN+VBN+LS & 0.783 & 4.31 & 19:43 \\ +Medium CGAN+VBN & 0.735 & 3.82 & 19:38 \\ +Medium CGAN+VBN+LS & 0.763 & 3.91 & 19:43 \\ *MNIST original & 0.9846 & 9.685 & N/A \\ \hline \end{tabular} \end{table} @@ -174,7 +177,7 @@ Medium CGAN+VBN+LS & 0.783 & 4.31 & 19:43 \\ ### Architecture -We observe increased accruacy as we increase the depth of the arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. +We observe increased accruacy as we increase the depth of the GAN arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. ### One Side Label Smoothing @@ -275,24 +278,9 @@ as most of the testing images that got misclassified (mainly nines and fours) sh Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. Both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), but PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would indeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations. -* This is an open question. Do you have any other ideas to improve GANs or -have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance, - -\begin{itemize} +## Data representation -\item How do you compare GAN with PCA? We leant PCA as another generative model in the -Pattern Recognition module (EE468/EE9SO29/EE9CS729). Strengths/weaknesses? - -\item Take the pre-trained classification network using 100% real training examples and use it -to extract the penultimate layer’s activations (embeddings) of 100 randomly sampled real -test examples and 100 randomly sampled synthetic examples from all the digits i.e. 0-9. -Use an embedding method e.g. t-sne [1] or PCA, to project them to a 2D subspace and -plot them. Explain what kind of patterns do you observe between the digits on real and -synthetic data. Also plot the distribution of confidence scores on these real and synthetic -sub-sampled examples by the classification network trained on 100% real data on two -separate graphs. Explain the trends in the graphs. - -\end{itemize} +TODO EXPLAIN WHAT WE HAVE DONE HERE \begin{figure} \centering @@ -395,6 +383,14 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/CDCGAN_arch.pdf} +\caption{Deep Convolutional CGAN Architecture} +\label{fig:cdcganarc} +\end{center} +\end{figure} + \begin{figure} \begin{center} \includegraphics[width=24em]{fig/short_cgan_ex.png} @@ -445,6 +441,16 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=12em]{fig/good_ex.png} +\includegraphics[width=12em]{fig/bad_ex.png} +\includegraphics[width=24em]{fig/cdcgan.png} +\caption{Convolutional CGAN+LS} +\label{fig:cdcloss} +\end{center} +\end{figure} + \begin{figure} \begin{center} \includegraphics[width=24em]{fig/fake_only.png} -- cgit v1.2.3-54-g00ecf From 328d28c6caf01464083b05d275a0044bb750767f Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 15:30:06 +0000 Subject: Further modificaations to sections III and IV --- report/paper.md | 47 ++++++++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 25 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 2a14059..3ea9b94 100644 --- a/report/paper.md +++ b/report/paper.md @@ -97,7 +97,7 @@ We evaluate permutations of the architecture involving: * Deep CGAN - 5 Dense-LeakyReLu-BN * Deep Convolutional GAN - DCGAN + conditional label input * One-Sided Label Smoothing (LS) -* Various Dropout (DO)- Use 0.1 and 0.5 Dropout parameters +* Various Dropout (DO)- Use 0.1, 0.3 and 0.5 * Virtual Batch Normalisation (VBN)- Normalisation based on one batch [@improved] \begin{figure} @@ -118,13 +118,31 @@ The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figur and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/med_cgan_ex.png} +\includegraphics[width=24em]{fig/med_cgan.png} +\caption{Medium CGAN} +\label{fig:cmed} +\end{center} +\end{figure} + The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{fig:cg_drop1_1} (0.1), \ref{fig:cmed}(0.3) and \ref{fig:cg_drop2_1}(0.5), both image quality and G-D losses are comparable. The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to incentivize the discriminator. -Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better. Performance results for +Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better (generator behaviour is reinforced). Performance results for one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}. +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/smoothing_ex.png} +\includegraphics[width=24em]{fig/smoothing.png} +\caption{One sided label smoothing} +\label{fig:smooth} +\end{center} +\end{figure} + Virtual Batch normalization does not affect performance significantly. Applying this technique to both the CGAN architectures used keeps G-D losses mostly unchanged. The biggest change we expect to see is a lower correlation between images in the same batch. This aspect will mostly affect performance when training a classifier with the generated images from CGAN, as we will obtain more diverse images. Training with a larger batch size @@ -135,16 +153,6 @@ it is possible to achieve a better performance by finer tuning of the Convolutio of the generator loss, hence the image quality varies a lot at each training step. Attempting LS on this architecture achieved a similar outcome when compared to the non-convolutional counterpart. - -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/med_cgan_ex.png} -\includegraphics[width=24em]{fig/med_cgan.png} -\caption{Medium CGAN} -\label{fig:cmed} -\end{center} -\end{figure} - # Inception Score Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the inceptioen score. @@ -154,9 +162,7 @@ $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) We further report the classification accuracy as found with LeNet. -### Classifier Architecture Used - -\begin{table}[] +\begin{table}[H] \begin{tabular}{llll} & Accuracy & IS & GAN Tr. Time \\ \hline Shallow CGAN & 0.645 & 3.57 & 8:14 \\ @@ -181,15 +187,6 @@ We observe increased accruacy as we increase the depth of the GAN arhitecture at ### One Side Label Smoothing -\begin{figure} -\begin{center} -\includegraphics[width=24em]{fig/smoothing_ex.png} -\includegraphics[width=24em]{fig/smoothing.png} -\caption{One sided label smoothing} -\label{fig:smooth} -\end{center} -\end{figure} - One sided label smoothing involves relaxing our confidence on the labels in our data. This lowers the loss target to below 1. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy. ### Virtual Batch Normalisation @@ -198,7 +195,7 @@ Virtual Batch Noramlisation is a further optimisation technique proposed by Tim ### Dropout -The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, nor does it seem to affect the quality of images produced, together with the G-D loss remain almost unchanged. Results are presented in figures \ref{fig:cg_drop1_1}, \ref{fig:cg_drop1_2}, \ref{fig:cg_drop2_1}, \ref{fig:cg_drop2_2}. +The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, nor does it seem to affect the quality of images produced, together with the G-D loss remain almost unchanged. Ultimately, judging from the inception scores, it is preferable to use a low dropout rate (in our case 0.1 seems to be the dropout rate that achieves the best results). # Re-training the handwritten digit classifier -- cgit v1.2.3-54-g00ecf From 21fb715f2758f9d61acdf949c3e726a6875f90ba Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 16:15:24 +0000 Subject: Additional details about G-D artificial balancing --- report/paper.md | 63 +++++++++++++++++++++++++++++++++------------------------ 1 file changed, 37 insertions(+), 26 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 3ea9b94..1686bc0 100644 --- a/report/paper.md +++ b/report/paper.md @@ -2,8 +2,6 @@ In this coursework we present two variants of the GAN architecture - DCGAN and CGAN, applied to the MNIST dataset and evaluate performance metrics across various optimisations techniques. The MNIST dataset contains 60,000 training images and 10,000 testing images of size 28x28, spread across ten classes representing the ten handwritten digits. -## GAN - Generative Adversarial Networks present a system of models which learn to output data, similar to training data. A trained GAN takes noise as an input and is able to provide an output with the same dimensions and relevant features as the samples it has been trained with. GAN's employ two neural networks - a *discriminator* and a *generator* which contest in a zero-sum game. The task of the *discriminator* is to distinguish generated images from real images, while the task of the generator is to produce realistic images which are able to fool the discriminator. @@ -23,6 +21,12 @@ Mode collapse is achieved with our naive *vanilla GAN* (Appendix-\ref{fig:vanill A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). +It is possible to artificially balance the number of steps between G and D backpropagation, however we think with a solid GAN structure this step is not +really needed. Updating D more frequently than G resulted in additional cases of mode collapse due to the vanishing gradient issue. Updating G more +frequently has not proved to be beneficial either, as the discriminator did not learn how to distinguish real samples from fake samples quickly enough. +For this reasons the following sections will not present any artificial balancing of G-D training steps, opting for a standard single step update for both +discriminator and generator. + # DCGAN ## DCGAN Architecture description @@ -62,7 +66,7 @@ We evaluate three different GAN architectures, varying the size of convolutional \end{figure} We observed that the deep architectures result in a more easily achievable equilibria of G-D losses. -Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5.000 batches, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. +Our medium depth DCGAN achieves very good performance, balancing both binary cross entropy losses at approximately 0.9 after 5,000 batches, reaching equilibrium quicker and with less oscillation that the Deepest DCGAN tested. As DCGAN is trained with no labels, the generator primary objective is to output images that fool the discriminator, but does not intrinsically separate the classes form one another. Therefore we sometimes observe oddly shape fused digits which may temporarily full be labeled real by the discriminator. This issue is solved by training the network for more batches or introducing a deeper architecture, as it can be deducted from a qualitative comparison between figures \ref{fig:dcmed}, \ref{fig:dcshort} and \ref{fig:dclong}. @@ -113,7 +117,7 @@ We evaluate permutations of the architecture involving: When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:clong}. +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry \ref{fig:clong}. The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. @@ -160,7 +164,8 @@ We use the logits extracted from LeNet: $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ -We further report the classification accuracy as found with LeNet. +We further report the classification accuracy as found with LeNet. For coherence purposes the inception scores were +calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with SGD optimizer, learning rate = 0.001). \begin{table}[H] \begin{tabular}{llll} @@ -207,7 +212,7 @@ injecting generated samples in the original training set to boost testing accura As observed in figure \ref{fig:mix1} we performed two experiments for performance evaluation: -* Keeping the same number of training samples while just changing the amount of real to generated data (55.000 samples in total). +* Keeping the same number of training samples while just changing the amount of real to generated data (55,000 samples in total). * Keeping the whole training set from MNIST and adding generated samples from CGAN. \begin{figure} @@ -252,7 +257,7 @@ improving testing accuracy. \end{figure} -We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160.000) +We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160,000) (learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. @@ -285,7 +290,7 @@ TODO EXPLAIN WHAT WE HAVE DONE HERE \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-mnist.png}}\\ \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cgan.png}}\quad \subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}} - \caption{Visualisations PCA: a) MNIST c) CGAN | TSNE b) MNIST d) CGAN} + \caption{Visualisations: a)MNIST|PCA b)MNIST|TSNE c)CGAN-gen|PCA d)CGAN-gen|TSNE} \label{fig:features} \end{figure} @@ -314,7 +319,9 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} # Appendix -\begin{figure} +## DCGAN-Appendix + +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/vanilla_gan_arc.pdf} \caption{Vanilla GAN Architecture} @@ -322,7 +329,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/generic_gan_loss.png} \caption{Shallow GAN D-G Loss} @@ -330,7 +337,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/short_dcgan_ex.png} \includegraphics[width=24em]{fig/short_dcgan.png} @@ -339,7 +346,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/long_dcgan_ex.png} \includegraphics[width=24em]{fig/long_dcgan.png} @@ -348,7 +355,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout01_gd.png} \caption{DCGAN Dropout 0.1 G-D Losses} @@ -356,7 +363,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/dcgan_dropout01.png} \caption{DCGAN Dropout 0.1 Generated Images} @@ -364,7 +371,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/dcgan_dropout05_gd.png} \caption{DCGAN Dropout 0.5 G-D Losses} @@ -372,7 +379,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/dcgan_dropout05.png} \caption{DCGAN Dropout 0.5 Generated Images} @@ -380,7 +387,9 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +## CGAN-Appendix + +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/CDCGAN_arch.pdf} \caption{Deep Convolutional CGAN Architecture} @@ -388,7 +397,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/short_cgan_ex.png} \includegraphics[width=24em]{fig/short_cgan.png} @@ -397,7 +406,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/long_cgan_ex.png} \includegraphics[width=24em]{fig/long_cgan.png} @@ -406,7 +415,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/cgan_dropout01.png} \caption{CGAN Dropout 0.1 G-D Losses} @@ -414,7 +423,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/cgan_dropout01_ex.png} \caption{CGAN Dropout 0.1 Generated Images} @@ -422,7 +431,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/cgan_dropout05.png} \caption{CGAN Dropout 0.5 G-D Losses} @@ -430,7 +439,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=14em]{fig/cgan_dropout05_ex.png} \caption{CGAN Dropout 0.5 Generated Images} @@ -438,7 +447,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/good_ex.png} \includegraphics[width=12em]{fig/bad_ex.png} @@ -448,7 +457,9 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +## Retrain-Appendix + +\begin{figure}[H] \begin{center} \includegraphics[width=24em]{fig/fake_only.png} \caption{Retraining with generated samples only} @@ -456,7 +467,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} -\begin{figure} +\begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/retrain_fail.png} \caption{Retraining failures} -- cgit v1.2.3-54-g00ecf From 4a55c6ae11fc358c6b48749264e9922b3e00698c Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 16:42:14 +0000 Subject: Add details --- report/paper.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 1686bc0..5587da4 100644 --- a/report/paper.md +++ b/report/paper.md @@ -102,7 +102,7 @@ We evaluate permutations of the architecture involving: * Deep Convolutional GAN - DCGAN + conditional label input * One-Sided Label Smoothing (LS) * Various Dropout (DO)- Use 0.1, 0.3 and 0.5 -* Virtual Batch Normalisation (VBN)- Normalisation based on one batch [@improved] +* Virtual Batch Normalisation (VBN)- Normalisation based on one batch(BN) [@improved] \begin{figure} \begin{center} @@ -117,7 +117,7 @@ We evaluate permutations of the architecture involving: When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry \ref{fig:clong}. +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}). The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. @@ -134,7 +134,7 @@ The image quality is better than the two examples reported earlier, proving that The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{fig:cg_drop1_1} (0.1), \ref{fig:cmed}(0.3) and \ref{fig:cg_drop2_1}(0.5), both image quality and G-D losses are comparable. -The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to incentivize the discriminator. +The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to reinforce discriminator behaviour. Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better (generator behaviour is reinforced). Performance results for one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}. @@ -188,21 +188,20 @@ Medium CGAN+VBN+LS & 0.763 & 3.91 & 19:43 \\ ### Architecture -We observe increased accruacy as we increase the depth of the GAN arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. +We observe increased accruacy as we increase the depth of the GAN arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. Despite the initial considerations about G-D losses for the Convolutional CGAN, there seems to be an improvement in inception score and test accuracy with respect to the other analysed cases. One sided label smoothing however did not improve this performanc any further, suggesting that reinforcing discriminator behaviour does not benefit the system in this case. ### One Side Label Smoothing -One sided label smoothing involves relaxing our confidence on the labels in our data. This lowers the loss target to below 1. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy. +One sided label smoothing involves relaxing our confidence on the labels in our data. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy in the case of our baseline (Medium CGAN). ### Virtual Batch Normalisation -Virtual Batch Noramlisation is a further optimisation technique proposed by Tim Salimans et. al. [@improved]. Virtual batch normalisation is a modification to the batch normalisation layer, which performs normalisation based on statistics from a reference batch. We observe that VBN improved the classification accuracy and the Inception score. +Virtual Batch Noramlisation is a further optimisation technique proposed by Tim Salimans et. al. [@improved]. Virtual batch normalisation is a modification to the batch normalisation layer, which performs normalisation based on statistics from a reference batch. We observe that VBN improved the classification accuracy and the Inception score. TODO EXPLAIN WHY ### Dropout The effect of dropout for the non-convolutional CGAN architecture does not affect performance as much as in DCGAN, nor does it seem to affect the quality of images produced, together with the G-D loss remain almost unchanged. Ultimately, judging from the inception scores, it is preferable to use a low dropout rate (in our case 0.1 seems to be the dropout rate that achieves the best results). - # Re-training the handwritten digit classifier ## Results @@ -274,6 +273,8 @@ boosted to 92%, making this technique the most successfull attempt of improvemen Failures classification examples are displayed in figure \ref{fig:retrain_fail}. The results showed indicate that the network we trained is actually performing quite well, as most of the testing images that got misclassified (mainly nines and fours) show ambiguities. +\newpage + # Bonus Questions ## Relation to PCA -- cgit v1.2.3-54-g00ecf From 672bdd094082d5be99b3149269a00f94875d0698 Mon Sep 17 00:00:00 2001 From: nunzip Date: Wed, 13 Mar 2019 17:20:03 +0000 Subject: Grammar fix --- report/paper.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'report/paper.md') diff --git a/report/paper.md b/report/paper.md index 5587da4..c2c1a56 100644 --- a/report/paper.md +++ b/report/paper.md @@ -35,7 +35,7 @@ DCGAN exploits convolutional stride to perform downsampling and transposed convo We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). -The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25. +The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal dropout rate of 0.25. The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`. The main architecture used can be observed in figure \ref{fig:dcganarc}. @@ -82,7 +82,7 @@ Applying Virtual Batch Normalization our Medium DCGAN does not provide observabl \end{figure} We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation -of the droupout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. +of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*. @@ -90,7 +90,7 @@ While training the different proposed DCGAN architectures, we did not observe mo ## CGAN Architecture description -CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer (slope=0.2) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu (slope=0.2) and a Droupout layer. +CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN architecture presents a series blocks each contained a dense layer, LeakyReLu layer (slope=0.2) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu (slope=0.2) and a Droupout layer. The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`). The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{fig:cdcganarc}. @@ -117,7 +117,7 @@ We evaluate permutations of the architecture involving: When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. -Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}). +Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}). The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. @@ -244,7 +244,7 @@ we cannot achieve much better with this very small amount of data, since the val We conduct one experiment, feeding the test set to a LeNet trained exclusively on data generated from our CGAN. It is noticeable that training for the first 5 epochs gives good results (figure \ref{fig:fake_only}) when compared to the learning curve obtained when training the network with only the few real samples. This indicates that we can use the generated data to train the first steps of the network (initial weights) and apply the real sample for 300 epochs to obtain -a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successfull at +a finer tuning. As observed in figure \ref{fig:few_init} the first steps of retraining will show oscillation, since the fine tuning will try and adapt to the newly fed data. The maximum accuracy reached before the validation curve plateaus is 88.6%, indicating that this strategy proved to be somewhat successful at improving testing accuracy. \begin{figure} @@ -259,7 +259,7 @@ improving testing accuracy. We try to improve the results obtained earlier by retraining LeNet with mixed data: few real samples and plenty of generated samples (160,000) (learning curve show in figure \ref{fig:training_mixed}. The peak accuracy reached is 91%. We then try to remove the generated samples to apply fine tuning, using only the real samples. After 300 more epochs (figure \ref{fig:training_mixed}) the test accuracy is -boosted to 92%, making this technique the most successfull attempt of improvement while using a limited amount of data from MNIST dataset. +boosted to 92%, making this technique the most successful attempt of improvement while using a limited amount of data from MNIST dataset. \begin{figure} \begin{center} @@ -307,7 +307,7 @@ TODO EXPLAIN WHAT WE HAVE DONE HERE ## Factoring in classification loss into GAN Classification accuracy and Inception score can be factored into the GAN to attempt to produce more realistic images. Shane Barrat and Rishi Sharma are able to indirectly optimise the inception score to over 900, and note that directly optimising for maximised Inception score produces adversarial examples [@inception-note]. -Nevertheless, a pretrained static classifier may be added to the GAN model, and it's loss incorporated into the loss added too the loss of the gan. +Nevertheless, a pretrained static classifier may be added to the GAN model, and it's loss incorporated into the loss added too the loss of the GAN. $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} $$ -- cgit v1.2.3-54-g00ecf