aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVasil Zlatanov <v@skozl.com>2019-03-14 23:39:42 +0000
committerVasil Zlatanov <v@skozl.com>2019-03-14 23:39:42 +0000
commitbac0bc27fcba5f2d59326cf327f16d5c2cc62809 (patch)
treeb659ea2a068f69bb670e6f851f299f2fbdba395d
parentff79fdd9acd1849e69d2beda574f1e9b0e2cce22 (diff)
downloade4-gan-bac0bc27fcba5f2d59326cf327f16d5c2cc62809.tar.gz
e4-gan-bac0bc27fcba5f2d59326cf327f16d5c2cc62809.tar.bz2
e4-gan-bac0bc27fcba5f2d59326cf327f16d5c2cc62809.zip
Write TODOs and change to cDCGAN
-rw-r--r--report/paper.md52
1 files changed, 26 insertions, 26 deletions
diff --git a/report/paper.md b/report/paper.md
index 098523a..a028093 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -88,14 +88,14 @@ but no mode collapse was observed even with the shallow model.
CGAN is a conditional version of a GAN which utilises labeled data. Unlike DCGAN, CGAN is trained with explicitly provided labels which allow CGAN to associate features with specific classes. The baseline CGAN which we evaluate is visible in figure \ref{fig:cganarc}. The baseline CGAN architecture presents a series of blocks, each containing a dense layer, `LeakyReLu` layer (`slope=0.2`) and a Batch Normalisation layer. The baseline discriminator uses Dense layers, followed by `LeakyReLu` (`slope=0.2`) and a Droupout layer.
The optimizer used for training is `Adam`(`learning_rate=0.002`, `beta=0.5`).
-The Convolutional CGAN (CDCGAN) analysed follows the structure presented in the relevant Appendix section. It uses TODO ADD BRIEF DESCRIPTION
+The architecture of the Deep Convolutional CGAN (cDCGAN) analysed is presented in the Appendix. It uses transpose convolutions with a stride of two to perform upscaling followed by convolutional bloks with singular stride. We find that kernel size of 3 by 3 worked well for all four convolutional blocks which include a Batch Normalization and an Activation layer. The architecture assessed in this paper uses multiplying layers to multiply the label embedding with the output `ReLu` blocks, as we found that it was more robust compared to addition of the label embedding via concatenation.
-We evaluate permutations of the architecture involving:
+The list of the architecture we evaluate in this report:
* Shallow CGAN - 1 `Dense-LeakyReLu` blocks
* Medium CGAN - 3 `Dense-LeakyReLu` blocks
* Deep CGAN - 5 `Dense-LeakyReLu` blocks
-* Deep Convolutional CGAN (CDCGAN)
+* Deep Convolutional CGAN (cDCGAN)
* One-Sided Label Smoothing (LS)
* Various Dropout (DO): 0.1, 0.3 and 0.5
* Virtual Batch Normalisation (VBN) - Normalisation based on one batch(BN) [@improved]
@@ -161,8 +161,8 @@ the same classes, indicating that mode collapse still did not occur.
\end{center}
\end{figure}
-The best performing architecture was CDCGAN. It is difficult to assess any potential improvement at this stage, since the samples produced
-between 8,000 and 13,000 batches are almost indistinguishable from the ones of the MNIST dataset (as it can be seen in figure \ref{fig:cdc}, middle). Training CDCGAN for more than
+The best performing architecture was cDCGAN. It is difficult to assess any potential improvement at this stage, since the samples produced
+between 8,000 and 13,000 batches are almost indistinguishable from the ones of the MNIST dataset (as it can be seen in figure \ref{fig:cdc}, middle). Training cDCGAN for more than
15,000 batches is however not beneficial, as the discriminator will keep improving, leading the generator loss to increase and produce bad samples as shown in the reported example.
We find a good balance for 12,000 batches.
@@ -171,7 +171,7 @@ We find a good balance for 12,000 batches.
\includegraphics[width=8em]{fig/cdc1.png}
\includegraphics[width=8em]{fig/cdc2.png}
\includegraphics[width=8em]{fig/cdc3.png}
-\caption{CDCGAN outputs; 1000 batches - 12000 batches - 20000 batches}
+\caption{cDCGAN outputs; 1000 batches - 12000 batches - 20000 batches}
\label{fig:cdc}
\end{center}
\end{figure}
@@ -188,7 +188,7 @@ tend to collapse to very small regions.
\includegraphics[width=8em]{fig/cdcloss1.png}
\includegraphics[width=8em]{fig/cdcloss2.png}
\includegraphics[width=8em]{fig/cdcloss3.png}
-\caption{CDCGAN G-D loss; Left G/D=1; Middle G/D=3; Right G/D=6}
+\caption{cDCGAN G-D loss; Left G/D=1; Middle G/D=3; Right G/D=6}
\label{fig:cdcloss}
\end{center}
\end{figure}
@@ -198,7 +198,7 @@ tend to collapse to very small regions.
\includegraphics[width=8em]{fig/cdc_collapse.png}
\includegraphics[width=8em]{fig/cdc_collapse.png}
\includegraphics[width=8em]{fig/cdc_collapse.png}
-\caption{CDCGAN G/D=6 mode collapse}
+\caption{cDCGAN G/D=6 mode collapse}
\label{fig:cdccollapse}
\end{center}
\end{figure}
@@ -225,9 +225,9 @@ calculated training the LeNet classifier under the same conditions across all ex
Shallow CGAN & 0.645 & 3.57 & 8:14 \\
Medium CGAN & 0.715 & 3.79 & 10:23 \\
Deep CGAN & 0.739 & 3.85 & 16:27 \\
-\textbf{CDCGAN} & \textbf{0.899} & \textbf{7.41} & 1:05:27 \\
+\textbf{cDCGAN} & \textbf{0.899} & \textbf{7.41} & 1:05:27 \\
Medium CGAN+LS & 0.749 & 3.643 & 10:42 \\
-CDCGAN+LS & 0.846 & 6.63 & 1:12:39 \\
+cDCGAN+LS & 0.846 & 6.63 & 1:12:39 \\
CCGAN-G/D=3 & 0.849 & 6.59 & 48:11 \\
CCGAN-G/D=6 & 0.801 & 6.06 & 36:05 \\
Medium CGAN DO=0.1 & 0.761 & 3.836 & 10:36 \\
@@ -242,11 +242,11 @@ Medium CGAN+VBN+LS & 0.763 & 3.91 & 19:43 \\
### Architecture
-We observe increased accruacy as we increase the depth of the GAN arhitecture at the cost of training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. CDCGAN achieves improved performance in comparison to the other cases analysed as we expected from the results obtained in the previous section, since the samples produced are almost identical to the ones of the original MNIST dataset.
+We observe increased accruacy as we increase the depth of the GAN arhitecture at the cost of training time. There appears to be diminishing returns with the deeper networks, and larger improvements are achievable with specific optimisation techniques. cDCGAN achieves improved performance in comparison to the other cases analysed as we expected from the results obtained in the previous section, since the samples produced are almost identical to the ones of the original MNIST dataset.
### One Side Label Smoothing
-One sided label smoothing involves relaxing our confidence on the labels in our data. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy in the case of our baseline (Medium CGAN). This technique however did not improve the performance of CDCGAN any further, suggesting that reinforcing discriminator behaviour does not benefit the system in this case.
+One sided label smoothing involves relaxing our confidence on the labels in our data. Tim Salimans et. al. [@improved] show smoothing of the positive labels reduces the vulnerability of the neural network to adversarial examples. We observe significant improvements to the Inception score and classification accuracy in the case of our baseline (Medium CGAN). This technique however did not improve the performance of cDCGAN any further, suggesting that reinforcing discriminator behaviour does not benefit the system in this case.
### Virtual Batch Normalisation
@@ -256,15 +256,15 @@ Virtual Batch Normalisation is a further optimisation technique proposed by Tim
Despite the difficulties in judging differences between G-D losses and image quality, dropout rate seems to have a noticeable effect on accuracy and inception score, with a variation of 3.6% between our best and worst dropout cases. Ultimately, judging from the measurements, it is preferable to use a low dropout rate (0.1 seems to be the one that achieves the best results).
-### G-D Balancing on CDCGAN
+### G-D Balancing on cDCGAN
-Despite achieving lower losses oscillation, using G/D=3 to incentivize generator training did not improve the performance of CDCGAN as it is observed from
+Despite achieving lower losses oscillation, using G/D=3 to incentivize generator training did not improve the performance of cDCGAN as it is observed from
the inception score and testing accuracy. We obtain in fact 5% less test accuracy, meaning that using this technique in our architecture produces on
-average lower quality images when compared to our standard CDCGAN.
+average lower quality images when compared to our standard cDCGAN.
# Re-training the handwritten digit classifier
-*In the following section the generated data we use will be exclusively produced by our CDCGAN architecture.*
+*In the following section the generated data we use will be exclusively produced by our cDCGAN architecture.*
## Results
@@ -274,7 +274,7 @@ injecting generated samples in the original training set to boost testing accura
As observed in figure \ref{fig:mix1} we performed two experiments for performance evaluation:
* Keeping the same number of training samples while just changing the ratio of real to generated data (55,000 samples in total).
-* Keeping the whole training set from MNIST and adding generated samples from CDCGAN.
+* Keeping the whole training set from MNIST and adding generated samples from cDCGAN.
\begin{figure}
\begin{center}
@@ -340,12 +340,12 @@ Similarly to GANs, PCA can be used to formulate **generative** models of a syste
## Data representation
Using the pre-trained classification on real training examples we extract embeddings of 10,000 randomly sampled real
-test examples and 10,000 randomly sampled synthetic examples using both CGAN and CDCGAN from the different classes.
+test examples and 10,000 randomly sampled synthetic examples using both CGAN and cDCGAN from the different classes.
We obtain both a PCA and TSNE representation of our data on two dimensions in figure \ref{fig:features}.
-It is observable that the network that achieved a good inception score (CDCGAN) produces embeddings that are very similar
+It is observable that the network that achieved a good inception score (cDCGAN) produces embeddings that are very similar
to the ones obtained from the original MNIST dataset, further strengthening our hypothesis about the performance of this
-specific model. On the other hand, with non CDCGAN we notice higher correlation between the two represented features
+specific model. On the other hand, with non cDCGAN we notice higher correlation between the two represented features
for the different classes, meaning that a good data separation was not achieved. This is probably due to the additional blur
produced around the images with our simple CGAN model.
@@ -357,18 +357,18 @@ produced around the images with our simple CGAN model.
\subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cgan.png}}\\
\subfloat[][]{\includegraphics[width=.2\textwidth]{fig/pca-cdc.png}}\quad
\subfloat[][]{\includegraphics[width=.2\textwidth]{fig/tsne-cdc.png}}
- \caption{Visualisations: a)MNIST|PCA b)MNIST|TSNE c)CGAN-gen|PCA d)CGAN-gen|TSNE e)CDCGAN-gen|PCA f)CDCGAN-gen|TSNE}
+ \caption{Visualisations: a)MNIST|PCA b)MNIST|TSNE c)CGAN-gen|PCA d)CGAN-gen|TSNE e)cDCGAN-gen|PCA f)cDCGAN-gen|TSNE}
\label{fig:features}
\end{figure}
-TODO COMMENT ON PR CURVES
+We have presented the Precision Recall Curve for the MNIST, against that of a Dense CGAN and Convolutional CGAN. While the superior performance of the convolutional GAN is evident, it is interesting to note that the precision curves are similar, specifically the numbers 8 and 9. For both architectures 9 is the worst digit on average, but for higher Recall we find that there is a smaller proportion of extremely poor 8's, which result in lower the digit to the poorest precision.
\begin{figure}
\centering
\subfloat[][]{\includegraphics[width=.22\textwidth]{fig/pr-mnist.png}}\quad
\subfloat[][]{\includegraphics[width=.22\textwidth]{fig/pr-cgan.png}}\\
\subfloat[][]{\includegraphics[width=.22\textwidth]{fig/pr-cdc.png}}
- \caption{Precisional Recall Curves a) MNIST : b) CGAN output c)CDCGAN output}
+ \caption{Precisional Recall Curves a) MNIST : b) CGAN output c)cDCGAN output}
\label{fig:rocpr}
\end{figure}
@@ -517,7 +517,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}}
\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/clustcollapse.png}
-\caption{CDCGAN G/D=6 Embeddings through LeNet}
+\caption{cDCGAN G/D=6 Embeddings through LeNet}
\label{fig:clustcollapse}
\end{center}
\end{figure}
@@ -525,7 +525,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}}
\begin{figure}[H]
\begin{center}
\includegraphics[width=8em]{fig/cdcsmooth.png}
-\caption{CDCGAN+LS outputs 12000 batches}
+\caption{cDCGAN+LS outputs 12000 batches}
\label{fig:cdcsmooth}
\end{center}
\end{figure}
@@ -538,7 +538,7 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}}
\end{center}
\end{figure}
-## CDCGAN Alternative Architecture
+## cDCGAN Alternative Architecture
\begin{figure}[H]
\begin{center}