aboutsummaryrefslogtreecommitdiff
path: root/report
diff options
context:
space:
mode:
Diffstat (limited to 'report')
-rw-r--r--report/fig/CGAN_arch.pdfbin0 -> 386712 bytes
-rw-r--r--report/fig/long_dcgan_ex.pdf (renamed from report/fig/large_dcgan_ex.pdf)bin329497 -> 329497 bytes
-rw-r--r--report/paper.md92
3 files changed, 67 insertions, 25 deletions
diff --git a/report/fig/CGAN_arch.pdf b/report/fig/CGAN_arch.pdf
new file mode 100644
index 0000000..bb4cfa9
--- /dev/null
+++ b/report/fig/CGAN_arch.pdf
Binary files differ
diff --git a/report/fig/large_dcgan_ex.pdf b/report/fig/long_dcgan_ex.pdf
index 9dac5e5..9dac5e5 100644
--- a/report/fig/large_dcgan_ex.pdf
+++ b/report/fig/long_dcgan_ex.pdf
Binary files differ
diff --git a/report/paper.md b/report/paper.md
index e1a78c6..55a0a63 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -1,21 +1,14 @@
# Introduction
-A Generative Adversarial Network is a system in which two blocks, discriminator and generator are competing in a "minmax game",
-in which the objective of the two blocks is respectively maximization and minimization of the function presented below,
-until an equilibrium is reached. During the weights update performed through the optimization process, the generator and discrimitaor are
-updated in alternating cycles.
+In this coursework we will present two variants of GAN architectures (DCGAN and CGAN) trained with the MNIST_dataset.
+The dataset contains 60.000 training images and 10.000 testing images of size 28x28, representing different digits (10 classes in total).
-$$ V (D,G) = E_{x~p_{data}(x)}[logD(x)] + E_{zp_z(z)}[log(1-D(G(z)))] $$
+Training a shallow GAN with no convolutional layers poses multiple problems: mode collapse, relatively low quality of images generated and unbalanced G-D losses.
-The issue with shallow architectures (**present the example we used for mode collapse**) can be ontain really fast training,
-while producing overall good results.
-
-One of the main issues enctoured with GAN architectures is mode collapse. As the discriminator keeps getting
-better, the generator tries to focus on one single class label to improve its loss. This issue can be observed in figure
-\ref{fig:mode_collapse}, in which we can observe how after 200 thousand iterations, the output of the generator only represents few
-of the labels originally fed to train the network. At that point the loss function of the generator starts getting worse as shown in figure
-\ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero, while the generator loss keeps
-increasing.
+As it can be seen in \ref{fig:mode_collapse}, after 200.000 iterations the network (**presented in appendix XXX**) shows mode collapse
+as the output of the generator only represents few of the labels originally fed. At that point the loss function of the generator stops
+improving as shown in figure \ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero,
+while the generator loss keeps increasing.
\begin{figure}
\begin{center}
@@ -33,33 +26,82 @@ increasing.
\end{center}
\end{figure}
+A significant improvement to this vanilla architecture is Deep Convolutional Generative Adversarial Networks (DCGAN).
# DCGAN
## DCGAN Architecture description
-Insert connection of schematic.
+DCGAN exploits convolutional stride to perform downsampling and transposed convolution to perform upsampling.
+
+We use batch normalization at the output of each convolutional layer (exception made for the output layer of the generator
+and the input layer of the discriminator). The activation functions of the intermediate layers are `ReLU` (for generator) and `LeakyReLU` with slope 0.2 (for discriminator).
+The activation functions used for the output are `tanh` for the generator and `sigmoid` for the discriminator. The convolutional layers' output in
+the discriminator uses dropout before feeding the next layers. We noticed a significant improvement in performance, and estimated an optimal droput rate of 0.25.
+The optimizer used for training is `Adam(learning_rate=0.002, beta=0.5)`.
-The typical structure of the generator for DCGAN consists of a sequential model in which the input is fed through a dense layer and upsampled.
-The following block involves Convolution+Batch_normalization+Relu_activation. The output is then upsampled again and fed to another Convolution+Batch_Normalization+Relu_activation block. The final output is obtained through a Convolution+Tanh_activation layer. The depth of the convolutional layers decreases from input to output.
+The main architecture used can be observed in figure \ref{fig:dcganarc}.
-The discriminator is designed through blocks that involve Convolution+Batch_Normalization+LeakyReLU_activation+Dropout. The depth of the convolutional layers increases from input to output.
+\begin{figure}
+\begin{center}
+\includegraphics[width=24em]{fig/DCGAN_arch.pdf}
+\caption{DCGAN Architecture}
+\label{fig:dcganarc}
+\end{center}
+\end{figure}
## Tests on MNIST
-Try some **different architectures, hyper-parameters**, and, if necessary, the aspects of **virtual batch
-normalization**, balancing G and D.
-Please discuss, with results, what challenge and how they are specifically addressing, including
-the quality of generated images and, also, the **mode collapse**.
+We propose 3 different architectures, varying the size of convolutional layers in the generator, while retaining the structure proposed in figure \ref{fig:dcganarc}:
+
+\begin{itemize}
+\item Shallow: Conv128-Conv64
+\item Medium: Conv256-Conv128
+\item Deep: Conv512-Conv256
+\end{itemize}
\begin{figure}
\begin{center}
-\includegraphics[width=24em]{fig/error_depth_kmean100.pdf}
-\caption{K-means Classification error varying tree depth (left) and forest size (right)}
-\label{fig:km-tree-param}
+\includegraphics[width=24em]{fig/short_dcgan_ex.pdf}
+\includegraphics[width=24em]{fig/short_dcgan.png}
+\caption{Shallow DCGAN}
+\label{fig:dcshort}
\end{center}
\end{figure}
+\begin{figure}
+\begin{center}
+\includegraphics[width=24em]{fig/med_dcgan_ex.pdf}
+\includegraphics[width=24em]{fig/med_dcgan.png}
+\caption{Medium DCGAN}
+\label{fig:dcmed}
+\end{center}
+\end{figure}
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=24em]{fig/long_dcgan_ex.pdf}
+\includegraphics[width=24em]{fig/long_dcgan.png}
+\caption{Deep DCGAN}
+\label{fig:dclong}
+\end{center}
+\end{figure}
+
+It is possible to notice that using deeper architectures it is possible to balance G-D losses more easilly. Medium DCGAN achieves a very good performance,
+balancing both binary cross entropy losses ar around 1 after 5.000 epochs, showing significantly lower oscillation for longer training even when compared to
+Deep DCGAN.
+
+Since we are training with no labels, the generator will simply try to output images that fool the discriminator, but do not directly map to one specific class.
+Examples of this can be observed for all the output groups reported above as some of the shapes look very odd (but smooth enough to be labelled as real). This
+specific issue is solved by training the network for more epochs or introducing a deeper architecture, as it can be deducted from a qualitative comparison
+between figures \ref{fig:dcshort}, \ref{fig:dcmed} and \ref{fig:dclong}.
+
+While training the different proposed DCGAN architectures, we did not observe mode collapse, confirming that the architecture used performed better than
+the simple GAN presented in the introduction.
+
+Applying Virtual Batch Normalization on Medium DCGAN does not provide observable changes in G-D balancing, but reduces within-batch correlation. Although it
+is difficult to qualitatively assess the improvements, figure \ref{fig:} shows results of the introduction of this technique.
+
# CGAN
## CGAN Architecture description