aboutsummaryrefslogtreecommitdiff
path: root/report
diff options
context:
space:
mode:
Diffstat (limited to 'report')
-rw-r--r--report/paper.md50
1 files changed, 42 insertions, 8 deletions
diff --git a/report/paper.md b/report/paper.md
index c2c1a56..b76ba5b 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -24,8 +24,6 @@ A significant improvement to this vanilla architecture is Deep Convolutional Gen
It is possible to artificially balance the number of steps between G and D backpropagation, however we think with a solid GAN structure this step is not
really needed. Updating D more frequently than G resulted in additional cases of mode collapse due to the vanishing gradient issue. Updating G more
frequently has not proved to be beneficial either, as the discriminator did not learn how to distinguish real samples from fake samples quickly enough.
-For this reasons the following sections will not present any artificial balancing of G-D training steps, opting for a standard single step update for both
-discriminator and generator.
# DCGAN
@@ -84,6 +82,18 @@ Applying Virtual Batch Normalization our Medium DCGAN does not provide observabl
We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation
of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches.
+Trying different parameters for artificial G-D balancing in the training stage did not achieve any significant benefits as discussed in section I,
+exclusively leading to the generation of more artifacts (figure \ref{fig:baldc}). We also attempted to increase the D training steps with respect to G,
+but no mode collapse was observed even with the shallow model.
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=12em]{fig/bal4.png}
+\caption{DCGAN Balancing G-D; D/G=3}
+\label{fig:baldc}
+\end{center}
+\end{figure}
+
While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*.
# CGAN
@@ -118,8 +128,7 @@ When comparing the three levels of depth for the architectures it is possible to
a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't
experience any issues with vanishing gradient, hence no mode collapse is reached.
Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}).
-The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced,
-and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5.
+The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1.
The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise.
\begin{figure}
@@ -135,13 +144,12 @@ The three levels of dropout rates attempted do not affect the performance signif
image quality and G-D losses are comparable.
The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to reinforce discriminator behaviour.
-Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better (generator behaviour is reinforced). Performance results for
-one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}.
+Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better (generator behaviour is reinforced).
+Performance results for one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}.
\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/smoothing_ex.png}
-\includegraphics[width=24em]{fig/smoothing.png}
\caption{One sided label smoothing}
\label{fig:smooth}
\end{center}
@@ -152,11 +160,29 @@ mostly unchanged. The biggest change we expect to see is a lower correlation bet
performance when training a classifier with the generated images from CGAN, as we will obtain more diverse images. Training with a larger batch size
would show more significant results, but since we set this parameter to 128 the issue of within-batch correlation is limited.
+Similarly to DCGAN, changing the G-D steps did not lead to good quality results as it can be seen in figure \ref{fig:cbalance}, in which we tried to train
+with D/G=15 for 10,000 batches, trying to initialize good discriminator weights, to then revert to a D/G=1, aiming to balance the losses of the two networks.
+Even in the case of a shallow network, in which mode collapse should have been more likely, we observed diversity between the samples produced for
+the same classes, indicating that mode collapse still did not occur.
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=8em]{fig/bal1.png}
+\includegraphics[width=8em]{fig/bal2.png}
+\includegraphics[width=8em]{fig/bal3.png}
+\caption{CGAN G-D balancing results}
+\label{fig:cbalance}
+\end{center}
+\end{figure}
+
+FIX THIS
Convolutional CGAN did not achieve better results than our baseline approach for the architecture analyzed, although we believe that
it is possible to achieve a better performance by finer tuning of the Convolutional CGAN parameters. Figure \ref{fig:cdcloss} shows a very high oscillation
of the generator loss, hence the image quality varies a lot at each training step. Attempting LS on this architecture achieved a similar outcome
when compared to the non-convolutional counterpart.
+ADD PIC
+
# Inception Score
Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the inceptioen score.
@@ -165,7 +191,7 @@ We use the logits extracted from LeNet:
$$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$
We further report the classification accuracy as found with LeNet. For coherence purposes the inception scores were
-calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with SGD optimizer, learning rate = 0.001).
+calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with `SGD`, `learning rate=0.001`).
\begin{table}[H]
\begin{tabular}{llll}
@@ -458,6 +484,14 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}}
\end{center}
\end{figure}
+\begin{figure}[H]
+\begin{center}
+\includegraphics[width=24em]{fig/smoothing.png}
+\caption{CGAN+LS G-D Losses}
+\label{fig:smoothgd}
+\end{center}
+\end{figure}
+
## Retrain-Appendix
\begin{figure}[H]