Almost finished DCGAN

author: nunzip <np.scarh@gmail.com> 2019-03-07 15:59:43 +0000
committer: nunzip <np.scarh@gmail.com> 2019-03-07 15:59:43 +0000
commit: 4a24e5a577a9587c6841f0d42d00e249367e9b4a (patch)
tree: 35ebb4e95a506756bd0f6809eb247977fd2eb204
parent: c9958b93e9d2e2ea9b7e7556a02736835f905df4 (diff)
download: e4-gan-4a24e5a577a9587c6841f0d42d00e249367e9b4a.tar.gz
e4-gan-4a24e5a577a9587c6841f0d42d00e249367e9b4a.tar.bz2
e4-gan-4a24e5a577a9587c6841f0d42d00e249367e9b4a.zip
1 files changed, 67 insertions, 25 deletions
diff --git a/report/paper.md b/report/paper.md
index 0227b1e..206963f 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -1,21 +1,14 @@
 # Introduction 
 
-A Generative Adversarial Network is a system in which two blocks, discriminator and generator are competing in a "minmax game",
-in which the objective of the two blocks is respectively maximization and minimization of the function presented below,
-until an equilibrium is reached. During the weights update performed through the optimization process, the generator and discrimitaor are
-updated in alternating cycles.
+In this coursework we will present two variants of GAN architectures (DCGAN and CGAN) trained with the MNIST_dataset.
+The dataset contains 60.000 training images and 10.000 testing images of size 28x28, representing different digits (10 classes in total).
 
-$$ V (D,G) = E_{x~p_{data}(x)}[logD(x)] + E_{zp_z(z)}[log(1-D(G(z)))] $$
+Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse, relatively low quality of images generated and unbalanced G-D losses.
 
-The issue with shallow architectures (**present the example we used for mode collapse**) can be ontain really fast training,
-while producing overall good results.
-
-One of the main issues enctoured with GAN architectures is mode collapse. As the discriminator keeps getting 
-better, the generator tries to focus on one single class label to improve its loss. This issue can be observed in figure 
-\ref{fig:mode_collapse}, in which we can observe how after 200 thousand iterations, the output of the generator only represents few 
-of the labels originally fed to train the network. At that point the loss function of the generator starts getting worse as shown in figure
-\ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero, while the generator loss keeps 
-increasing.
+As it can be seen in \ref{fig:mode_collapse}, after 200.000 iterations the network (**presented in appendix XXX**) shows mode collapse
+as the output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops 
+improving as shown in figure \ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero, 
+while the generator loss keeps increasing.
 
 \begin{figure}
 \begin{center}
@@ -33,33 +26,82 @@ increasing.
 \end{center}
 \end{figure}
 
+A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN).
 
 # DCGAN
 
 ## DCGAN Architecture description
 
-Insert connection of schematic.
+DCGAN exploits convolutional stride to perform downsampling and transposed convolution to perform upsampling. 
+
+We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator 
+and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator).
+The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in
+the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25.
+The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`.
 
-The typical structure of the generator for DCGAN consists of a sequential model in which the input is fed through a dense layer and upsampled. 
-The following block involves Convolution+Batch_normalization+Relu_activation. The output is then upsampled again and fed to another Convolution+Batch_Normalization+Relu_activation block. The final output is obtained through a Convolution+Tanh_activation layer. The depth of the convolutional layers decreases from input to output.
+The main architecture used can be observed in figure \ref{fig:dcganarc}.
 
-The discriminator is designed through blocks that involve Convolution+Batch_Normalization+LeakyReLU_activation+Dropout. The depth of the convolutional layers increases from input to output. 
+\begin{figure}
+\begin{center}
+\includegraphics[width=24em]{fig/DCGAN_arch.pdf}
+\caption{DCGAN Architecture}
+\label{fig:dcganarc}
+\end{center}
+\end{figure}
 
 ## Tests on MNIST
 
-Try some **different architectures, hyper-parameters**, and, if necessary, the aspects of **virtual batch
-normalization**, balancing G and D.
-Please discuss, with results, what challenge and how they are specifically addressing, including
-the quality of generated images and, also, the **mode collapse**. 
+We propose 3 different architectures, varying the size of convolutional layers in the generator, while retaining the structure proposed in figure \ref{fig:dcganarc}: 
+
+\begin{itemize}
+\item Shallow: Conv128-Conv64
+\item Medium: Conv256-Conv128
+\item Deep: Conv512-Conv256
+\end{itemize}
 
 \begin{figure}
 \begin{center}
-\includegraphics[width=24em]{fig/error_depth_kmean100.pdf}
-\caption{K-means Classification error varying tree depth (left) and forest size (right)}
-\label{fig:km-tree-param}
+\includegraphics[width=24em]{fig/short_dcgan_ex.pdf}
+\includegraphics[width=24em]{fig/short_dcgan.png}
+\caption{Shallow DCGAN}
+\label{fig:dcshort}
 \end{center}
 \end{figure}
 
+\begin{figure}
+\begin{center}
+\includegraphics[width=24em]{fig/med_dcgan_ex.pdf}
+\includegraphics[width=24em]{fig/med_dcgan.png}
+\caption{Medium DCGAN}
+\label{fig:dcmed}
+\end{center}
+\end{figure}
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=24em]{fig/long_dcgan_ex.pdf}
+\includegraphics[width=24em]{fig/long_dcgan.png}
+\caption{Deep DCGAN}
+\label{fig:dclong}
+\end{center}
+\end{figure}
+
+It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance,
+balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to
+Deep DCGAN.
+
+Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class.
+Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This
+specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison
+between figures \ref{fig:dcshort}, \ref{fig:dcmed} and \ref{fig:dclong}.
+
+While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than
+the simple GAN presented in the introduction.
+
+Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it 
+is difficult to qualitatively assess the improvements, figure \ref{fig:} shows results of the introduction of this technique.
+
 # CGAN
 
 ## CGAN Architecture description
author	nunzip <np.scarh@gmail.com>	2019-03-07 15:59:43 +0000
committer	nunzip <np.scarh@gmail.com>	2019-03-07 15:59:43 +0000
commit	4a24e5a577a9587c6841f0d42d00e249367e9b4a (patch)
tree	35ebb4e95a506756bd0f6809eb247977fd2eb204
parent	c9958b93e9d2e2ea9b7e7556a02736835f905df4 (diff)
download	e4-gan-4a24e5a577a9587c6841f0d42d00e249367e9b4a.tar.gz e4-gan-4a24e5a577a9587c6841f0d42d00e249367e9b4a.tar.bz2 e4-gan-4a24e5a577a9587c6841f0d42d00e249367e9b4a.zip