report/paper.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175

# Codebooks

## K-means codebook 

A common technique for codebook generation involves utilising K-means clustering on a sample of the
image descriptors. In this way descriptors may be mapped to *visual* words which lend themselves to
binning and therefore the creation of bag-of-words histograms for the use of classification.

In this courseworok 100-thousand descriptors have been selected to build the visual vocabulary from the
Caltech dataset.

## Vocabulary size 

The number of clusters or the number of centroids determine the vocabulary size when creating the codebook with the K-means the method. Each descriptor is mapped to the nearest centroid, and each descriptor belonging to that cluster is mapped to the same *visual word*. This allows similar descriptors to be mapped to the same word, allowing for comparison through bag-of-words techniques.

## Bag-of-words histogram quantisation of descriptor vectors

An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.

The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count [@km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE

\begin{figure}[H]
\begin{center}
\includegraphics[height=4em]{fig/hist_test.jpg}
\includegraphics[width=20em]{fig/km-histogram.pdf}
\caption{Bag-of-words Training histogram}
\label{fig:histo_tr}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[height=4em]{fig/hist_train.jpg}
\includegraphics[width=20em]{fig/km-histtest.pdf}
\caption{Bag-of-words Testing histogram}
\label{fig:histo_te}
\end{center}
\end{figure}

# RF classifier 

## Hyperparameters tuning

Figure \ref{fig:km-tree-param} shows the effect of tree depth and number of trees
for kmean 100 cluster centers.

\begin{figure}[H]
\begin{center}
\includegraphics[width=12em]{fig/error_depth_kmean100.pdf}
\includegraphics[width=12em]{fig/trees_kmean.pdf}
\caption{Classification error varying trees depth(left) and numbers of trees(right)}
\label{fig:km-tree-param}
\end{center}
\end{figure}

Figure \ref{fig:kmeanrandom} shows randomness parameter for kmean 100.

\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/new_kmean_random.pdf}
\caption{newkmeanrandom}
\label{fig:kmeanrandom}
\end{center}
\end{figure}

## Weak Learners comparison

In figure \ref{fig:2pt} it is possible to notice an improvement in recognition accuracy by 1%,
with the two pixels test, achieving better results than the axis-aligned counterpart. The two-pixels
test however brings a slight deacrease in time performance which has been measured to be on **average 3 seconds**
more. This is due to the complexity added by the two-pixels test, since it adds one dimension to the computation.

\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/2pixels_kmean.pdf}
\caption{Kmean classification accuracy changing the type of weak learners}
\label{fig:2pt}
\end{center}
\end{figure}

## Impact of the vocabulary size on classification accuracy. 

\begin{figure}[H]
\begin{center}
\includegraphics[width=12em]{fig/kmeans_vocsize.pdf}
\includegraphics[width=12em]{fig/time_kmeans.pdf}
\caption{Effect of vocabulary size; classification error left, time right}
\label{fig:km_vocsize}
\end{center}
\end{figure}

## Confusion matrix for case XXX, with examples of failure and success 

\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/e100k256d5_cm.pdf}
\caption{e100k256d5cm Kmean Confusion Matrix}
\label{fig:km_cm}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[width=10em]{fig/success_km.pdf}
\includegraphics[width=10em]{fig/fail_km.pdf}
\caption{Kmean: Success on the left; Failure on the right}
\label{fig:km_succ}
\end{center}
\end{figure}

# RF codebook

In Q1, replace the K-means with the random forest codebook, i.e. applying RF to 128 dimensional
descriptor vectors with their image category labels, and using the RF leaves as the visual
vocabulary. With the bag-of-words representations of images obtained by the RF codebook, train
and test Random Forest classifier similar to Q2. Try different parameters of the RF codebook and
RF classifier, and show/discuss the results in comparison with the results of Q2, including the
vector quantisation complexity. 

\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/256t1_e200D5_cm.pdf}
\caption{Part 3 confusion matrix e100k256d5cm}
\label{fig:p3_cm}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[width=10em]{fig/success_3.pdf}
\includegraphics[width=10em]{fig/fail_3.pdf}
\caption{Part3: Success on the left; Failure on the right}
\label{fig:p3_succ}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[width=12em]{fig/error_depth_p3.pdf}
\includegraphics[width=12em]{fig/trees_p3.pdf}
\caption{Classification error varying trees depth(left) and numbers of trees(right)}
\label{fig:p3_trees}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/p3_rand.pdf}
\caption{Effect of randomness parameter on classification error}
\label{fig:p3_rand}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[width=12em]{fig/p3_vocsize.pdf}
\includegraphics[width=12em]{fig/p3_time.pdf}
\caption{Effect of vocabulary size; classification error left, time right}
\label{fig:p3_voc}
\end{center}
\end{figure}

\begin{figure}[H]
\begin{center}
\includegraphics[width=18em]{fig/p3_colormap.pdf}
\caption{Varying leaves and estimators: effect on accuracy}
\label{fig:p3_colormap}
\end{center}
\end{figure}

# Comparison of methods and conclusions

# References