diff options
author | Vasil Zlatanov <v@skozl.com> | 2019-03-14 00:36:27 +0000 |
---|---|---|
committer | Vasil Zlatanov <v@skozl.com> | 2019-03-14 00:36:27 +0000 |
commit | 00ee8e36064ed247643a68c7fa8591d5a17347d9 (patch) | |
tree | 40fe542188c432dea040b918ea7f675fdbe4b557 | |
parent | 096ed94e1955d9f1e15f295f5a7dee74fdaa65dc (diff) | |
parent | 5dabb5d0ba596539901ca7521402618a3b595e5f (diff) | |
download | e4-gan-00ee8e36064ed247643a68c7fa8591d5a17347d9.tar.gz e4-gan-00ee8e36064ed247643a68c7fa8591d5a17347d9.tar.bz2 e4-gan-00ee8e36064ed247643a68c7fa8591d5a17347d9.zip |
Merge branch 'master' of skozl.com:e4-gan
-rw-r--r-- | report/paper.md | 50 |
1 files changed, 42 insertions, 8 deletions
diff --git a/report/paper.md b/report/paper.md index c2c1a56..b76ba5b 100644 --- a/report/paper.md +++ b/report/paper.md @@ -24,8 +24,6 @@ A significant improvement to this vanilla architecture is Deep Convolutional Gen It is possible to artificially balance the number of steps between G and D backpropagation, however we think with a solid GAN structure this step is not really needed. Updating D more frequently than G resulted in additional cases of mode collapse due to the vanishing gradient issue. Updating G more frequently has not proved to be beneficial either, as the discriminator did not learn how to distinguish real samples from fake samples quickly enough. -For this reasons the following sections will not present any artificial balancing of G-D training steps, opting for a standard single step update for both -discriminator and generator. # DCGAN @@ -84,6 +82,18 @@ Applying Virtual Batch Normalization our Medium DCGAN does not provide observabl We evaluated the effect of different dropout rates (results in appendix figures \ref{fig:dcdrop1_1}, \ref{fig:dcdrop1_2}, \ref{fig:dcdrop2_1}, \ref{fig:dcdrop2_2}) and concluded that the optimisation of the dropout hyper-parameter is essential for maximising performance. A high dropout rate results in DCGAN producing only artifacts that do not match any specific class due to the generator performing better than the discriminator. Conversely a low dropout rate leads to an initial stabilisation of G-D losses, but ultimately results in instability under the form of oscillation when training for a large number of batches. +Trying different parameters for artificial G-D balancing in the training stage did not achieve any significant benefits as discussed in section I, +exclusively leading to the generation of more artifacts (figure \ref{fig:baldc}). We also attempted to increase the D training steps with respect to G, +but no mode collapse was observed even with the shallow model. + +\begin{figure} +\begin{center} +\includegraphics[width=12em]{fig/bal4.png} +\caption{DCGAN Balancing G-D; D/G=3} +\label{fig:baldc} +\end{center} +\end{figure} + While training the different proposed DCGAN architectures, we did not observe mode collapse, indicating the DCGAN is less prone to a collapse compared to our *vanilla GAN*. # CGAN @@ -118,8 +128,7 @@ When comparing the three levels of depth for the architectures it is possible to a shallow architecture we notice a high oscillation of the generator loss (figure \ref{fig:cshort}), which is being overpowered by the discriminator. Despite this we don't experience any issues with vanishing gradient, hence no mode collapse is reached. Similarly, with a deep architecture the discriminator still overpowers the generator, and an equilibrium between the two losses is not achieved. The image quality in both cases is not really high: we can see that even after 20,000 batches the some pictures appear to be slightly blurry (figure \ref{fig:clong}). -The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, -and their value goes below 1, meaning the GAN is approaching the theoretical Nash Equilibrium of 0.5. +The best compromise is reached for 3 Dense-LeakyReLu-BN blocks as shown in figure \ref{fig:cmed}. It is possible to observe that G-D losses are perfectly balanced, and their value goes below 1. The image quality is better than the two examples reported earlier, proving that this Medium-depth architecture is the best compromise. \begin{figure} @@ -135,13 +144,12 @@ The three levels of dropout rates attempted do not affect the performance signif image quality and G-D losses are comparable. The biggest improvement in performance is obtained through one-sided label smoothing, shifting the true labels form 1 to 0.9 to reinforce discriminator behaviour. -Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better (generator behaviour is reinforced). Performance results for -one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}. +Using 0.1 instead of zero for the fake labels does not improve performance, as the discriminator loses incentive to do better (generator behaviour is reinforced). +Performance results for one-sided labels smoothing with true labels = 0.9 are shown in figure \ref{fig:smooth}. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/smoothing_ex.png} -\includegraphics[width=24em]{fig/smoothing.png} \caption{One sided label smoothing} \label{fig:smooth} \end{center} @@ -152,11 +160,29 @@ mostly unchanged. The biggest change we expect to see is a lower correlation bet performance when training a classifier with the generated images from CGAN, as we will obtain more diverse images. Training with a larger batch size would show more significant results, but since we set this parameter to 128 the issue of within-batch correlation is limited. +Similarly to DCGAN, changing the G-D steps did not lead to good quality results as it can be seen in figure \ref{fig:cbalance}, in which we tried to train +with D/G=15 for 10,000 batches, trying to initialize good discriminator weights, to then revert to a D/G=1, aiming to balance the losses of the two networks. +Even in the case of a shallow network, in which mode collapse should have been more likely, we observed diversity between the samples produced for +the same classes, indicating that mode collapse still did not occur. + +\begin{figure} +\begin{center} +\includegraphics[width=8em]{fig/bal1.png} +\includegraphics[width=8em]{fig/bal2.png} +\includegraphics[width=8em]{fig/bal3.png} +\caption{CGAN G-D balancing results} +\label{fig:cbalance} +\end{center} +\end{figure} + +FIX THIS Convolutional CGAN did not achieve better results than our baseline approach for the architecture analyzed, although we believe that it is possible to achieve a better performance by finer tuning of the Convolutional CGAN parameters. Figure \ref{fig:cdcloss} shows a very high oscillation of the generator loss, hence the image quality varies a lot at each training step. Attempting LS on this architecture achieved a similar outcome when compared to the non-convolutional counterpart. +ADD PIC + # Inception Score Inception score is calculated as introduced by Tim Salimans et. al [@improved]. However as we are evaluating MNIST, we use LeNet-5 [@lenet] as the basis of the inceptioen score. @@ -165,7 +191,7 @@ We use the logits extracted from LeNet: $$ \textrm{IS}(x) = \exp(\mathbb{E}_x \left( \textrm{KL} ( p(y\mid x) \| p(y) ) \right) ) $$ We further report the classification accuracy as found with LeNet. For coherence purposes the inception scores were -calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with SGD optimizer, learning rate = 0.001). +calculated training the LeNet classifier under the same conditions across all experiments (100 epochs with `SGD`, `learning rate=0.001`). \begin{table}[H] \begin{tabular}{llll} @@ -458,6 +484,14 @@ $$ L_{\textrm{total}} = \alpha L_{\textrm{LeNet}} + \beta L_{\textrm{generator}} \end{center} \end{figure} +\begin{figure}[H] +\begin{center} +\includegraphics[width=24em]{fig/smoothing.png} +\caption{CGAN+LS G-D Losses} +\label{fig:smoothgd} +\end{center} +\end{figure} + ## Retrain-Appendix \begin{figure}[H] |