aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authornunzip <np.scarh@gmail.com>2019-03-13 15:14:12 +0000
committernunzip <np.scarh@gmail.com>2019-03-13 15:14:12 +0000
commiteaa279aa9a2732e967f503ecca6008a4e12329cf (patch)
tree43781cc7bb412e2aa5e6164da899a3a20f8a4754
parentde8c86ffee5cae9667de94530157d3ffef879ce3 (diff)
downloade4-gan-eaa279aa9a2732e967f503ecca6008a4e12329cf.tar.gz
e4-gan-eaa279aa9a2732e967f503ecca6008a4e12329cf.tar.bz2
e4-gan-eaa279aa9a2732e967f503ecca6008a4e12329cf.zip
Write more about CGAN and add figures
-rw-r--r--report/fig/bad_ex.pngbin0 -> 15772 bytes
-rw-r--r--report/fig/cdcgan.pngbin0 -> 26406 bytes
-rw-r--r--report/fig/good_ex.pngbin0 -> 14206 bytes
-rw-r--r--report/paper.md90
4 files changed, 48 insertions, 42 deletions
diff --git a/report/fig/bad_ex.png b/report/fig/bad_ex.png
new file mode 100644
index 0000000..bdc899e
--- /dev/null
+++ b/report/fig/bad_ex.png
Binary files differ
diff --git a/report/fig/cdcgan.png b/report/fig/cdcgan.png
new file mode 100644
index 0000000..179e9a4
--- /dev/null
+++ b/report/fig/cdcgan.png
Binary files differ
diff --git a/report/fig/good_ex.png b/report/fig/good_ex.png
new file mode 100644
index 0000000..43bb567
--- /dev/null
+++ b/report/fig/good_ex.png
Binary files differ
diff --git a/report/paper.md b/report/paper.md
index 03fad67..2a14059 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -86,18 +86,19 @@ While training the different proposed DCGAN architectures, we did not observe mo
## CGAN Architecture description
-CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganrc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu and a Droupout layer.
+CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific labels. This has the intrinsic advantage of allowing us to specify the label of generated data. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN arhitecture presents a series blocks each contained a dense layer, LeakyReLu layer (slope=0.2) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by LeakyReLu (slope=0.2) and a Droupout layer.
+The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`).
-The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{}.
+The Convolutional CGAN analysed follows a structure similar to DCGAN and is presented in figure \ref{fig:cdcganarc}.
We evaluate permutations of the architecture involving:
-* Shallow CGAN - 1 Dense-ReLu-BN block
-* Deep CGAN - 5 Dense-ReLu-BN
+* Shallow CGAN - 1 Dense-LeakyReLu-BN block
+* Deep CGAN - 5 Dense-LeakyReLu-BN
* Deep Convolutional GAN - DCGAN + conditional label input
-* Label Smoothing (One Sided) - Truth labels to 0 and $1-\alpha$ (0.9)
-* Various Dropout - Use 0.1 and 0.5 Dropout parameters
-* Virtual Batch Normalisation - Normalisation based on one batch [@improved]
+* One-Sided Label Smoothing (LS)
+* Various Dropout (DO)- Use 0.1 and 0.5 Dropout parameters
+* Virtual Batch Normalisation (VBN)- Normalisation based on one batch [@improved]
\begin{figure}
\begin{center}
@@ -107,31 +108,33 @@ We evaluate permutations of the architecture involving:
\end{center}
\end{figure}
-\begin{figure}
-\begin{center}
-\includegraphics[width=24em]{fig/CDCGAN_arch.pdf}
-\caption{Deep Convolutional CGAN Architecture}
-\label{fig:cdcganarc}
-\end{center}
-\end{figure}
-
## Tests on MNIST
When comparing the three levels of depth for the architectures it is possible to notice significant differences for the G-D losses balancing. In
-a shallow architecture we notice a high oscillation of the generator loss \ref{fig:}, which is being overpowered by the discriminator. Despite this we don't
+a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't
experience any issues with vanishing gradient, hence no mode collapse is reached.
-Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:}.
+Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not acheived. The image quality in both cases is not really high: we can see that even after 20000 batches the some pictures appear to be slightly blurry \ref{fig:clong}.
+The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced,
+and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5.
+The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise.
-The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{}, \ref{} and \ref{}, both
+The three levels of dropout rates attempted do not affect the performance significantly, and as we can see in figures \ref{fig:cg_drop1_1} (0.1), \ref{fig:cmed}(0.3) and \ref{fig:cg_drop2_1}(0.5), both
image quality and G-D losses are comparable.
The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to incentivize the discriminator.
Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better. Performance results for
-one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{}.
+one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}.
+
+Virtual Batch normalization does not affect performance significantly. Applying this technique to both the CGAN architectures used keeps G-D losses
+mostly unchanged. The biggest change we expect to see is a lower correlation between images in the same batch. This aspect will mostly affect
+performance when training a classifier with the generated images from CGAN, as we will obtain more diverse images. Training with a larger batch size
+would show more significant results, but since we set this parameter to 128 the issue of within-batch correlation is limited.
-ADD FORMULA?
+Convolutional CGAN did not achieve better results than our baseline approach for the architecture analyzed, although we believe that
+it is possible to achieve a better performance by finer tuning of the Convolutional CGAN parameters. Figure \ref{fig:cdcloss} shows a very high oscillation
+of the generator loss, hence the image quality varies a lot at each training step. Attempting LS on this architecture achieved a similar outcome
+when compared to the non-convolutional counterpart.
-ADD VBN TALKING ABOUT TIME AND RESULTS
\begin{figure}
\begin{center}
@@ -155,7 +158,7 @@ We further report the classification accuracy as found with LeNet.
\begin{table}[]
\begin{tabular}{llll}
- & Accuracy & Inception Sc. & GAN Tr. Time \\ \hline
+ & Accuracy & IS & GAN Tr. Time \\ \hline
Shallow CGAN & 0.645 & 3.57 & 8:14 \\
Medium CGAN & 0.715 & 3.79 & 10:23 \\
Deep CGAN & 0.739 & 3.85 & 16:27 \\
@@ -164,8 +167,8 @@ Medium CGAN+LS & 0.749 & 3.643 & 10:42 \\
Convolutional CGAN+LS & 0.601 & 2.494 & 27:36 \\
Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\
Medium CGAN DO=0.5 & 0.725 & 3.677 & 10:36 \\
-Medium CGAN+VBN & 0.745 & 4.02 & 19:38 \\
-Medium CGAN+VBN+LS & 0.783 & 4.31 & 19:43 \\
+Medium CGAN+VBN & 0.735 & 3.82 & 19:38 \\
+Medium CGAN+VBN+LS & 0.763 & 3.91 & 19:43 \\
*MNIST original & 0.9846 & 9.685 & N/A \\ \hline
\end{tabular}
\end{table}
@@ -174,7 +177,7 @@ Medium CGAN+VBN+LS & 0.783 & 4.31 & 19:43 \\
### Architecture
-We observe increased accruacy as we increase the depth of the arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques.
+We observe increased accruacy as we increase the depth of the GAN arhitecture at the cost of the training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques.
### One Side Label Smoothing
@@ -275,24 +278,9 @@ as most of the testing images that got misclassified (mainly nines and fours) sh
Similarly to GAN's, PCA can be used to formulate **generative** models of a system. While GAN's are trained neural networks, PCA is a definite statistical procedure which perform orthogonal transformations of the data. Both attempt to identify the most important or *variant* features of the data (which we may then use to generate new data), but PCA by itself is only able to extract linearly related features. In a purely linear system, a GAN would be converging to PCA. In a more complicated system, we would indeed to identify relevant kernels in order to extract relevant features with PCA, while a GAN is able to leverage dense and convolutional neural network layers which may be trained to perform relevant transformations.
-* This is an open question. Do you have any other ideas to improve GANs or
-have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance,
-
-\begin{itemize}
+## Data representation
-\item How do you compare GAN with PCA? We leant PCA as another generative model in the
-Pattern Recognition module (EE468/EE9SO29/EE9CS729). Strengths/weaknesses?
-
-\item Take the pre-trained classification network using 100% real training examples and use it
-to extract the penultimate layer’s activations (embeddings) of 100 randomly sampled real
-test examples and 100 randomly sampled synthetic examples from all the digits i.e. 0-9.
-Use an embedding method e.g. t-sne [1] or PCA, to project them to a 2D subspace and
-plot them. Explain what kind of patterns do you observe between the digits on real and
-synthetic data. Also plot the distribution of confidence scores on these real and synthetic
-sub-sampled examples by the classification network trained on 100% real data on two
-separate graphs. Explain the trends in the graphs.
-
-\end{itemize}
+TODO EXPLAIN WHAT WE HAVE DONE HERE
\begin{figure}
\centering
@@ -397,6 +385,14 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}}
\begin{figure}
\begin{center}
+\includegraphics[width=24em]{fig/CDCGAN_arch.pdf}
+\caption{Deep Convolutional CGAN Architecture}
+\label{fig:cdcganarc}
+\end{center}
+\end{figure}
+
+\begin{figure}
+\begin{center}
\includegraphics[width=24em]{fig/short_cgan_ex.png}
\includegraphics[width=24em]{fig/short_cgan.png}
\caption{Shallow CGAN}
@@ -447,6 +443,16 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}}
\begin{figure}
\begin{center}
+\includegraphics[width=12em]{fig/good_ex.png}
+\includegraphics[width=12em]{fig/bad_ex.png}
+\includegraphics[width=24em]{fig/cdcgan.png}
+\caption{Convolutional CGAN+LS}
+\label{fig:cdcloss}
+\end{center}
+\end{figure}
+
+\begin{figure}
+\begin{center}
\includegraphics[width=24em]{fig/fake_only.png}
\caption{Retraining with generated samples only}
\label{fig:fake_only}