aboutsummaryrefslogtreecommitdiff
path: root/report/paper.md
blob: 0227b1ef4792a8e0c64c1760acff0f31156095b5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# Introduction 

A Generative Adversarial Network is a system in which two blocks, discriminator and generator are competing in a "minmax game",
in which the objective of the two blocks is respectively maximization and minimization of the function presented below,
until an equilibrium is reached. During the weights update performed through the optimization process, the generator and discrimitaor are
updated in alternating cycles.

$$ V (D,G) = E_{x~p_{data}(x)}[logD(x)] + E_{zp_z(z)}[log(1-D(G(z)))] $$

The issue with shallow architectures (**present the example we used for mode collapse**) can be ontain really fast training,
while producing overall good results.

One of the main issues enctoured with GAN architectures is mode collapse. As the discriminator keeps getting 
better, the generator tries to focus on one single class label to improve its loss. This issue can be observed in figure 
\ref{fig:mode_collapse}, in which we can observe how after 200 thousand iterations, the output of the generator only represents few 
of the labels originally fed to train the network. At that point the loss function of the generator starts getting worse as shown in figure
\ref{fig:vanilla_loss}. As we observe, G-D balance in not achieved as the discriminator loss almost reaches zero, while the generator loss keeps 
increasing.

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/generic_gan_loss.png}
\caption{Shallow GAN D-G Loss}
\label{fig:vanilla_loss}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/generic_gan_mode_collapse.pdf}
\caption{Shallow GAN mode collapse}
\label{fig:mode_collapse}
\end{center}
\end{figure}


# DCGAN

## DCGAN Architecture description

Insert connection of schematic.

The typical structure of the generator for DCGAN consists of a sequential model in which the input is fed through a dense layer and upsampled. 
The following block involves Convolution+Batch_normalization+Relu_activation. The output is then upsampled again and fed to another Convolution+Batch_Normalization+Relu_activation block. The final output is obtained through a Convolution+Tanh_activation layer. The depth of the convolutional layers decreases from input to output.

The discriminator is designed through blocks that involve Convolution+Batch_Normalization+LeakyReLU_activation+Dropout. The depth of the convolutional layers increases from input to output. 

## Tests on MNIST

Try some **different architectures, hyper-parameters**, and, if necessary, the aspects of **virtual batch
normalization**, balancing G and D.
Please discuss, with results, what challenge and how they are specifically addressing, including
the quality of generated images and, also, the **mode collapse**. 

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/error_depth_kmean100.pdf}
\caption{K-means Classification error varying tree depth (left) and forest size (right)}
\label{fig:km-tree-param}
\end{center}
\end{figure}

# CGAN

## CGAN Architecture description

## Tests on MNIST

Try **different architectures, hyper-parameters**, and, if necessary, the aspects of **one-sided label
smoothing**, **virtual batch normalization**, balancing G and D.
Please perform qualitative analyses on the generated images, and discuss, with results, what
challenge and how they are specifically addressing. Is there the **mode collapse issue?**

# Inception Score

## Classifier Architecture Used

## Results

Measure the inception scores i.e. we use the class labels to
generate images in CGAN and compare them with the predicted labels of the generated images.

Also report the recognition accuracies on the
MNIST real testing set (10K), in comparison to the inception scores.

**Please measure and discuss the inception scores for the different hyper-parameters/tricks and/or
architectures in Q2.**

# Re-training the handwritten digit classifier

## Results

Retrain with different portions and test BOTH fake and real queries. Please **vary** the portions
of the real training and synthetic images, e.g. 10%, 20%, 50%, and 100%, of each.

## Adapted Training Strategy

*Using even a small number of real samples per class would already give a high recognition rate,
which is difficult to improve. Use few real samples per class, and, plenty generated images in a
good quality and see if the testing accuracy can be improved or not, over the model trained using
the few real samples only.
Did you have to change the strategy in training the classification network in order to improve the
testing accuracy? For example, use synthetic data to initialise the network parameters followed
by fine tuning the parameters with real data set. Or using realistic synthetic data based on the
confidence score from the classification network pre-trained on real data. If yes, please then
specify your training strategy in details.
Analyse and discuss the outcome of the experimental result.*

# Bonus

This is an open question. Do you have any other ideas to improve GANs or
have more insightful and comparative evaluations of GANs? Ideas are not limited. For instance,

\begin{itemize}

\item How do you compare GAN with PCA? We leant PCA as another generative model in the
Pattern Recognition module (EE468/EE9SO29/EE9CS729). Strengths/weaknesses?

\item Take the pre-trained classification network using 100% real training examples and use it
to extract the penultimate layer’s activations (embeddings) of 100 randomly sampled real
test examples and 100 randomly sampled synthetic examples from all the digits i.e. 0-9.
Use an embedding method e.g. t-sne [1] or PCA, to project them to a 2D subspace and
plot them. Explain what kind of patterns do you observe between the digits on real and
synthetic data. Also plot the distribution of confidence scores on these real and synthetic
sub-sampled examples by the classification network trained on 100% real data on two
separate graphs. Explain the trends in the graphs.

\item Can we add a classification loss (using the pre-trained classifier) to CGAN, and see if this
improve? The classification loss would help the generated images maintain the class 
labels, i.e. improving the inception score. What would be the respective network
architecture and loss function? 

\end{itemize}

# References

<div id="refs"></div>

\newpage

# Appendix