aboutsummaryrefslogtreecommitdiff
path: root/report
diff options
context:
space:
mode:
authorVasil Zlatanov <v@skozl.com>2018-12-14 11:15:09 +0000
committerVasil Zlatanov <v@skozl.com>2018-12-14 11:15:09 +0000
commit73248e8a915f0b9a85bf5cabde074164b0146690 (patch)
tree29a3e253fab5750b6f6802d27e150adae14d4ff6 /report
parent9cc82cd46d1db60cda474dfd28ac19d66180e576 (diff)
downloadvz215_np1915-73248e8a915f0b9a85bf5cabde074164b0146690.tar.gz
vz215_np1915-73248e8a915f0b9a85bf5cabde074164b0146690.tar.bz2
vz215_np1915-73248e8a915f0b9a85bf5cabde074164b0146690.zip
Revise section 1
Diffstat (limited to 'report')
-rw-r--r--report/paper.md52
1 files changed, 29 insertions, 23 deletions
diff --git a/report/paper.md b/report/paper.md
index f9fb19a..7108b4b 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -7,36 +7,43 @@ pedestrian images from disjoint cameras by pedestrian detectors. This problem is
challenging, as identities captured in photos are subject to various lighting, pose,
blur, background and occlusion from various camera views. This report considers
features extracted from the CUHK03 dataset, following a 50 layer Residual network
-(ResNet50). Different distance metrics techniques can be used to
-perform person re-identification across *disjoint* cameras.
+(ResNet-50). Different distance metrics techniques can be used to
+perform person re-identification across the *disjoint* cameras.
-Features extracted from Neural Networks such as ResNet-50 are already highly processed. We therefore expect it to be extremely hard to further optimise the feature vectors for data separation, but we may be able benefit
+Features extracted from Neural Networks such as ResNet-50 are already highly processed.
+We therefore expect it to be extremely hard to further optimise the feature
+vectors for data separation, but may be able benefit
from alternative neighbour matching algorithms that take into account the
relative positions of the nearest neighbours with the probe and each other.
## Dataset - CUHK03 Summary
The dataset CUHK03 contains 14096 pictures of people captured from two
-different cameras. The feature vectors used, extracted from a trained ResNet50 model
+different cameras. The feature vectors used, extracted from a trained ResNet-50 model
, contain 2048 features that are used for identification.
The pictures represent 1467 different identities, each of which appears 7 to 10
-times. Data is separated in train, query and gallery sets with `train_idx`,
+times. Data is separated in training, query and gallery sets with `train_idx`,
`query_idx` and `gallery_idx` respectively, where the training set has been used
-to develop the ResNet50 model used for feature extraction. This procedure has
+to develop the ResNet-50 model used for feature extraction. This procedure has
allowed the evaluation of distance metric learning techniques on the query and
-gallery sets, with the knowledge that we are not comparing features produced by a net over-fitted on them, as they were extracted based on the model derived from the training set.
+gallery sets, with the knowledge that we are not comparing features produced
+by a neural network which was specifically (over-)fitted on them, as they were
+extracted based on the model derived from the training set.
## Nearest Neighbor rank-list
-Nearest Neighbor aims to find the gallery image whose feature are the closest to
-the ones of a query image, predicting the class of the query image as the same
-of its nearest neighbor(s). The distance between images can be calculated through
-different distance metrics, however one of the most commonly used is euclidean
-distance:
+Nearest Neighbor can be used to find gallery images with features close to
+those of a query image, predicting the class or identify of the query image as
+the same of its nearest neighbor(s), based on distance.
+The distance between images can be calculated through different distance metrics.
+
+The most commonly used is euclidean distance:
$$ \textrm{NN}(x) = \operatorname*{argmin}_{i\in[m]} \|x-x_i\|. $$
+We further consider Mahalanobis and Jaccard distance in this paper.
+
# Baseline Evaluation
To evaluate improvements brought by alternative distance learning metrics a baseline
@@ -73,37 +80,36 @@ be due to the fact that we are removing feature scaling that was introduced by t
such that some of the features are more significant than others. By standartizing our
features at this point, we remove such scaling and may be losing useful information.
-## kMeans Clustering
+## $k$-Means Clustering
-An addition considered for the baseline is *kMeans clustering*. In theory this
-method allows to reduce computational complexity of the baseline NN by forming clusters
+An additional baseline technique considered is that of *$k$-Means clustering*. Clustering
+methods may allow the reduction of computational complexity of the baseline NN, by forming clusters
and performing a comparison between query image and clusters centers. The elements
associated with the closest cluster center are then considered to perform NN and
classify the query image.
-This method did not bring any major improvement to the baseline, as it can be seen from
+Clustering with $k$-Means, however, did not bring any major improvement to the baseline, as it can be seen from
figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects
performance, showing better identification accuracy for a number of clusters away from
-the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely
-be explained by the number of distance comparisons performed.
+the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}).
+This trend can likely be explained by the number of distance comparisons performed.
We would expect clustering with $k=1$ and $k=\textrm{label count}$ to have the same performance
of the baseline approach without clustering, as we are performing the same number of comparisons.
-
-Clustering is a great method of reducing computation time. Assuming 38 clusters of 38 neighbours
+We find that clustering is a great method of reducing computation time. Assuming 38 clusters of 38 neighbours
we would be performing only 76 distance computations for a gallery size of 1467, instead of the
original 1467. This however comes at the cost of ignoring neighbours from other clusters which may
be closer. Since clusters do not necessarily have the same number of datapoints inside them
(sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters,
which also appears to be the worst performing number of clusters.
-We find that for the query and gallery set clustering does not seem to
-improve identification accuracy, and consider it an additional baseline.
+Since we find that for the query and gallery set clustering does not
+improve identification accuracy, we consider it an additional baseline.
\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/kmeanacc.pdf}
-\caption{Top 1 Identification accuracy varying kmeans cluster size}
+\caption{Top 1 Identification accuracy varying $k$-Means cluster size}
\label{fig:kmeans}
\end{center}
\end{figure}