From 4a24e5a577a9587c6841f0d42d00e249367e9b4a Mon Sep 17 00:00:00 2001 From: nunzip Date: Thu, 7 Mar 2019 15:59:43 +0000 Subject: Almost finished DCGAN --- report/paper.md | 92 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 67 insertions(+), 25 deletions(-) (limited to 'report') diff --git a/report/paper.md b/report/paper.md index 0227b1e..206963f 100644 --- a/report/paper.md +++ b/report/paper.md @@ -1,21 +1,14 @@ # Introduction -A Generative Adversarial Network is a system in which two blocks, discriminator and generator are competing in a "minmax game", -in which the objective of the two blocks is respectively maximization and minimization of the function presented below, -until an equilibrium is reached. During the weights update performed through the optimization process, the generator and discrimitaor are -updated in alternating cycles. +In this coursework we will present two variants of GAN architectures (DCGAN and CGAN) trained with the MNIST_dataset. +The dataset contains 60.000 training images and 10.000 testing images of size 28x28, representing different digits (10 classes in total). -$$ V (D,G) = E_{x~p_{data}(x)}[logD(x)] + E_{zp_z(z)}[log(1-D(G(z)))] $$ +Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse, relatively low quality of images generated and unbalanced G-D losses. -The issue with shallow architectures (**present the example we used for mode collapse**) can be ontain really fast training, -while producing overall good results. - -One of the main issues enctoured with GAN architectures is mode collapse. As the discriminator keeps getting -better, the generator tries to focus on one single class label to improve its loss. This issue can be observed in figure -\ref{fig:mode_collapse}, in which we can observe how after 200 thousand iterations, the output of the generator only represents few -of the labels originally fed to train the network. At that point the loss function of the generator starts getting worse as shown in figure -\ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero, while the generator loss keeps -increasing. +As it can be seen in \ref{fig:mode_collapse}, after 200.000 iterations the network (**presented in appendix XXX**) shows mode collapse +as the output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops +improving as shown in figure \ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero, +while the generator loss keeps increasing. \begin{figure} \begin{center} @@ -33,33 +26,82 @@ increasing. \end{center} \end{figure} +A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN). # DCGAN ## DCGAN Architecture description -Insert connection of schematic. +DCGAN exploits convolutional stride to perform downsampling and transposed convolution to perform upsampling. + +We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator +and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator). +The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in +the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25. +The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`. -The typical structure of the generator for DCGAN consists of a sequential model in which the input is fed through a dense layer and upsampled. -The following block involves Convolution+Batch_normalization+Relu_activation. The output is then upsampled again and fed to another Convolution+Batch_Normalization+Relu_activation block. The final output is obtained through a Convolution+Tanh_activation layer. The depth of the convolutional layers decreases from input to output. +The main architecture used can be observed in figure \ref{fig:dcganarc}. -The discriminator is designed through blocks that involve Convolution+Batch_Normalization+LeakyReLU_activation+Dropout. The depth of the convolutional layers increases from input to output. +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/DCGAN_arch.pdf} +\caption{DCGAN Architecture} +\label{fig:dcganarc} +\end{center} +\end{figure} ## Tests on MNIST -Try some **different architectures, hyper-parameters**, and, if necessary, the aspects of **virtual batch -normalization**, balancing G and D. -Please discuss, with results, what challenge and how they are specifically addressing, including -the quality of generated images and, also, the **mode collapse**. +We propose 3 different architectures, varying the size of convolutional layers in the generator, while retaining the structure proposed in figure \ref{fig:dcganarc}: + +\begin{itemize} +\item Shallow: Conv128-Conv64 +\item Medium: Conv256-Conv128 +\item Deep: Conv512-Conv256 +\end{itemize} \begin{figure} \begin{center} -\includegraphics[width=24em]{fig/error_depth_kmean100.pdf} -\caption{K-means Classification error varying tree depth (left) and forest size (right)} -\label{fig:km-tree-param} +\includegraphics[width=24em]{fig/short_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/short_dcgan.png} +\caption{Shallow DCGAN} +\label{fig:dcshort} \end{center} \end{figure} +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/med_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/med_dcgan.png} +\caption{Medium DCGAN} +\label{fig:dcmed} +\end{center} +\end{figure} + +\begin{figure} +\begin{center} +\includegraphics[width=24em]{fig/long_dcgan_ex.pdf} +\includegraphics[width=24em]{fig/long_dcgan.png} +\caption{Deep DCGAN} +\label{fig:dclong} +\end{center} +\end{figure} + +It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance, +balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to +Deep DCGAN. + +Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class. +Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This +specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison +between figures \ref{fig:dcshort}, \ref{fig:dcmed} and \ref{fig:dclong}. + +While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than +the simple GAN presented in the introduction. + +Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it +is difficult to qualitatively assess the improvements, figure \ref{fig:} shows results of the introduction of this technique. + # CGAN ## CGAN Architecture description -- cgit v1.2.3-54-g00ecf