1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
|
# Forulation of the Addresssed Machine Learning Problem
## Probelm Definition
The person re-identification problem presented in this paper requires matching
pedestrian images from disjoint cameras by pedestrian detectors. This problem is
challenging, as identities captured in photsos are subject to various lighting, pose,
blur, background and oclusion from various camera views. This report considers
features extracted from the CUHK03 dataset, following a 50 layer Residual network
(Resnet50). This paper considers distance metrics techniques which can be used to
perform person re-identification across *disjoint* cameras, using these features.
## Dataset - CUHK03 Summary
The dataset CUHK03 contains 14096 pictures of people captured from two
different cameras. The feature vectors used, extracted from a trained ResNet50 model
, contain 2048 features that are used for identification.
The pictures represent 1467 different identities, each of which appears 9 to 10
times. Data is seperated in train, query and gallery sets with `train_idx`,
`query_idx` and `gallery_idx` respectively, where the training set has been used
to develop the ResNet50 model used for feature extraction. This procedure has
allowed the evaluation of distance metric learning techniques on the query and
gallery sets, with the knowledge that we are not comparing overfitted features,
as they were extracted based on the model derived from the training set.
## Probelm to solve
The problem to solve is to create a ranklist for each image of the query set
by finding the nearest neighbor(s) within a gallery set. However gallery images
with the same label and taken from the same camera as the query image should
not be considered when forming the ranklist.
## Nearest Neighbor ranklist
Nearest Neighbor aims to find the gallery image whose feature are the closest to
the ones of a query image, predicting the class of the query image as the same
of its nearest neighbor(s). The distance between images can be calculated through
different distance metrics, however one of the most commonly used is euclidean
distance:
$$ \textrm{NN}(x) = \operatorname*{argmin}_{i\in[m]} \|x-x_i\|^2 $$
*Square root when calculating euclidean distance is ommited as it does not
affect ranking by distance*
Alternative distance metrics exist such as jaccardian and mahalanobis, which can
be used as an alternative to euclidiean distance.
# Baseline Evaluation
To evaluate improvements brought by alternative distance learning metrics a baseline
is established through nearest neighbour identification as previously described.
Identification accuracies at top1, top5 and top10 are respectively 47%, 67% and 75%
(figure \ref{fig:baselineacc}). The mAP for a ranklist of size 10 is 33.3%.
\begin{figure}
\begin{center}
\includegraphics[width=20em]{fig/baseline.pdf}
\caption{Recognition accuracy of baseline Nearest Neighbor @rank k}
\label{fig:baselineacc}
\end{center}
\end{figure}
Figure \ref{fig:eucrank} shows the ranklist generated through baseline NN for
5 query images(black). Correct identification is shown in green and incorrect
identification is shown in red.
\begin{figure}
\begin{center}
\includegraphics[width=22em]{fig/eucranklist.png}
\caption{Ranklist @rank10 generated for 5 query images}
\label{fig:eucrank}
\end{center}
\end{figure}
Normalization of the feature vectors does not improve accuracy results of the
baseline as it can be seen in figure \ref{fig:baselineacc}. ###EXPLAIN WHY
## kMean Clustering
An addition considered for the baseline is *kMeans clustering*. In theory this method
allows to reduce computational complexity of the baseline NN by forming clusters and
performing a comparison between query image and clusters centers. The elements
associated with the closest cluster center are then considered to perform NN and
classify the query image.
This method did not bring any major improvement to the baseline, as it can be seen from
figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects
performance, showing better identification accuracy for a number of clusters away from
the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). ###EXPLAIN WHY
\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/kmeanacc.pdf}
\caption{Top 1 Identification accuracy varying kmeans cluster size}
\label{fig:kmeans}
\end{center}
\end{figure}
# Suggested Improvement
## k-reciprocal Reranking
\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/ranklist.png}
\caption{Ranklist (improved method) @rank10 generated for 5 query images}
\label{fig:ranklist2}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=20em]{fig/comparison.pdf}
\caption{Comparison of recognition accuracy @rank k (KL=0.3,K1=9,K2=3)}
\label{fig:compare}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=12em]{fig/pqvals.pdf}
\includegraphics[width=12em]{fig/trainpqvals.pdf}
\caption{Identification accuracy varying K1 and K2 (gallery-query left, train right)}
\label{fig:pqvals}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=12em]{fig/lambda_acc.pdf}
\includegraphics[width=12em]{fig/lambda_acc_tr.pdf}
\caption{Top 1 Identification Accuracy with Rerank varying lambda(gallery-query left, train right)}
\label{fig:lambda}
\end{center}
\end{figure}
# Comment on Mahalnobis Distance as a metric
We were not able to achieve significant improvements using mahalanobis for
original distance ranking compared to square euclidiaen metrics. Results can
be observed using the `-m|--mahalanobis` when running evalution with the
repository complimenting this paper.
# Conclusion
# References
# Appendix
\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/cdist.pdf}
\caption{First two features of gallery(o) and query(x) feature data}
\label{fig:subspace}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/clusteracc.pdf}
\caption{Top k identification accuracy for cluster count}
\label{fig:clustk}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/jaccard.pdf}
\caption{Explained Jaccard}
\label{fig:jaccard}
\end{center}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/mahalanobis.pdf}
\caption{Explained Mahalanobis}
\label{fig:mahalanobis}
\end{center}
\end{figure}
|