aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVasil Zlatanov <v@skozl.com>2018-12-13 16:31:04 +0000
committerVasil Zlatanov <v@skozl.com>2018-12-13 16:31:04 +0000
commita95c5919bb211ba43d3189f656cc51ed341135bd (patch)
tree1372e2c9ab6a34c0d3feb3ccf2c2a68236c5e477
parent593e701b22513809816359f14a5045528cc50bef (diff)
downloadvz215_np1915-a95c5919bb211ba43d3189f656cc51ed341135bd.tar.gz
vz215_np1915-a95c5919bb211ba43d3189f656cc51ed341135bd.tar.bz2
vz215_np1915-a95c5919bb211ba43d3189f656cc51ed341135bd.zip
Fix grammer/spelling/lines
-rwxr-xr-xreport/paper.md91
1 files changed, 51 insertions, 40 deletions
diff --git a/report/paper.md b/report/paper.md
index 79d4183..ecf1e3e 100755
--- a/report/paper.md
+++ b/report/paper.md
@@ -1,15 +1,21 @@
-# Forulation of the Addresssed Machine Learning Problem
+# Formulation of the Addressed Machine Learning Problem
## Probelm Definition
The person re-identification problem presented in this paper requires matching
pedestrian images from disjoint cameras by pedestrian detectors. This problem is
-challenging, as identities captured in photsos are subject to various lighting, pose,
+challenging, as identities captured in photos are subject to various lighting, pose,
blur, background and oclusion from various camera views. This report considers
features extracted from the CUHK03 dataset, following a 50 layer Residual network
(Resnet50). This paper considers distance metrics techniques which can be used to
perform person re-identification across *disjoint* cameras, using these features.
+Features extracted from Neural Networks such as ResNet-50 are already highly processed
+and represent features extracted from the raw data. We therefore expect it to be extremely
+hard to further optimise the feature vectors for data separation, but we may be able benefit
+from alternative neighbour matching algorithms that take into account the
+relative positions of the nearest neighbours with the probe and each other.
+
## Dataset - CUHK03 Summary
The dataset CUHK03 contains 14096 pictures of people captured from two
@@ -17,21 +23,14 @@ different cameras. The feature vectors used, extracted from a trained ResNet50 m
, contain 2048 features that are used for identification.
The pictures represent 1467 different identities, each of which appears 9 to 10
-times. Data is seperated in train, query and gallery sets with `train_idx`,
+times. Data is separated in train, query and gallery sets with `train_idx`,
`query_idx` and `gallery_idx` respectively, where the training set has been used
to develop the ResNet50 model used for feature extraction. This procedure has
allowed the evaluation of distance metric learning techniques on the query and
-gallery sets, with the knowledge that we are not comparing overfitted features,
+gallery sets, with the knowledge that we are not comparing over-fitted features,
as they were extracted based on the model derived from the training set.
-## Probelm to solve
-
-The problem to solve is to create a ranklist for each image of the query set
-by finding the nearest neighbor(s) within a gallery set. However gallery images
-with the same label and taken from the same camera as the query image should
-not be considered when forming the ranklist.
-
-## Nearest Neighbor ranklist
+## Nearest Neighbor rank-list
Nearest Neighbor aims to find the gallery image whose feature are the closest to
the ones of a query image, predicting the class of the query image as the same
@@ -41,15 +40,13 @@ distance:
$$ \textrm{NN}(x) = \operatorname*{argmin}_{i\in[m]} \|x-x_i\| $$
-Alternative distance metrics exist such as jaccardian and mahalanobis, which can
-be used as an alternative to euclidiean distance.
-
# Baseline Evaluation
To evaluate improvements brought by alternative distance learning metrics a baseline
-is established through nearest neighbour identification as previously described.
+is established through generic single nearest neighbour identification as outlined above.
Identification accuracies at top1, top5 and top10 are respectively 47%, 67% and 75%
-(figure \ref{fig:baselineacc}). The mAP is 47.2%.
+(figure \ref{fig:baselineacc}). The mAP is 47.2%. This is a *good* result and is a traditional
+classification and identification technique which is effective with good feature transformation.
\begin{figure}
\begin{center}
@@ -59,43 +56,52 @@ Identification accuracies at top1, top5 and top10 are respectively 47%, 67% and
\end{center}
\end{figure}
-Figure \ref{fig:eucrank} shows the ranklist generated through baseline NN for
-5 query images(black). Correct identification is shown in green and incorrect
+Figure \ref{fig:eucrank} shows the rank-list generated through baseline NN for
+5 query images (black). Correct identification is shown in green and incorrect
identification is shown in red.
\begin{figure}
\begin{center}
\includegraphics[width=22em]{fig/eucranklist.png}
-\caption{Top 10 ranklist generated for 5 query images}
+\caption{Top 10 rank-list generated for 5 query images}
\label{fig:eucrank}
\end{center}
\end{figure}
-Magnitude normalization of the feature vectors does not improve
-accuracy results of the baseline as it can be seen in figure \ref{fig:baselineacc}.
-This is due to the fact that the feature vectors appear to be scaled, releative to their
-significance, for optimal distance classification, and as such normalising loses this
-scaling by importance which may have previously been introduced by the network.
+Magnitude normalization is a common technique, used to equalize feature importance.
+Applying magnitude normalization (scaling feature vectors to unit length) had a negative
+effect re-identification. Furthemore standartization by removing feature mean and deviation
+also had negative effect on performance as seen on figure \label{fig:baselineacc}. This may
+be due to the fact that we are removing feature scaling that was introduced by the Neural network,
+such that some of the features are more significant than the others. By standartizing our
+features at this point, we remove such scaling and may be losing using useful metrics.
## kMeans Clustering
-An addition considered for the baseline is *kMeans clustering*. In theory this method
-allows to reduce computational complexity of the baseline NN by forming clusters and
-performing a comparison between query image and clusters centers. The elements
+An addition considered for the baseline is *kMeans clustering*. In theory this
+method allows to reduce computational complexity of the baseline NN by forming clusters
+and performing a comparison between query image and clusters centers. The elements
associated with the closest cluster center are then considered to perform NN and
classify the query image.
This method did not bring any major improvement to the baseline, as it can be seen from
figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects
performance, showing better identification accuracy for a number of clusters away from
-the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely be explained by the number of distance comparison's performed.
+the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely
+be explained by the number of distance comparison's performed.
We would expect clustering with $k=1$ and $k=\textrm{label count}$ to have the same performance
the baseline approach without clustering, as we are performing the same number of comparisons.
-Clustering is a great method of reducing computation time. Assuming 39 clusters of 39 neighbours we would be performing only 78 distance computation for a gallery size of 1487, instead of the original 1487. This however comes at the cost of ignoring neighbours from other clusters which may be closer. Since clusters do not necessarily have the same number of datapoints inside them (sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters, which also appears to be the worst performing number of clusters.
+Clustering is a great method of reducing computation time. Assuming 39 clusters of 39 neighbours
+we would be performing only 78 distance computation for a gallery size of 1487, instead of the
+original 1487. This however comes at the cost of ignoring neighbours from other clusters which may
+be closer. Since clusters do not necessarily have the same number of datapoints inside them
+(sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters,
+which also appears to be the worst performing number of clusters.
-We find that for the query and gallery set clustering does not seem to improve identification accuracy, and consider it an additional baseline.
+We find that for the query and gallery set clustering does not seem to
+improve identification accuracy, and consider it an additional baseline.
\begin{figure}
\begin{center}
@@ -112,14 +118,17 @@ We find that for the query and gallery set clustering does not seem to improve i
We were not able to achieve significant improvements using mahalanobis for
original distance ranking compared to square euclidiaen metrics.
-The mahalanobis distance metric was used to create the ranklist as an alternative to euclidean distance.
+The mahalanobis distance metric was used to create the rank-list as an alternative to euclidean distance.
When performing mahalanobis with the training set as the covariance matrix, reported accuracy is reduced to **38%** .
We also attempted to perform the same mahalanobis metric on a reduced PCA featureset. This allowed for significant execution
time improvements due to the greatly reduced computation requierments for smaller featurespace, but nevertheless demonstrated no
improvements over an euclidean metric.
-These results are likely due to the **extremely** low covariance of features in the training set. This is evident when looking at the Covariance matrix of the training data, and is also visible in figure \ref{fig:subspace}. This is likely the result of the feature transformations performed the the ResNet-50 convolution model the features were extracted from.
+These results are likely due to the **extremely** low covariance of features in the training set.
+This is evident when looking at the Covariance matrix of the training data, and is also
+visible in figure \ref{fig:subspace}. This is likely the result of the feature
+transformations performed the the ResNet-50 convolution model the features were extracted from.
\begin{figure}
\begin{center}
@@ -130,16 +139,18 @@ These results are likely due to the **extremely** low covariance of features in
\end{center}
\end{figure}
+While we did not use mahalanobis as a primary distance metric, it is possible to use the Mahalanobis metric, together with the next investigated solution $k
+
## $k$-reciprocal Re-ranking Formulation
The approach addressed to improve the identification performance is based on
$k$-reciprocal re-ranking. The following section summarizes the idea behind
the method illustrated in reference @rerank-paper.
-We define $N(p,k)$ as the top $k$ elements of the ranklist generated through NN,
-where $p$ is a query image. The k reciprocal ranklist, $R(p,k)$ is defined as the
+We define $N(p,k)$ as the top $k$ elements of the rank-list generated through NN,
+where $p$ is a query image. The k reciprocal rank-list, $R(p,k)$ is defined as the
intersection $R(p,k)=\{g_i|(g_i \in N(p,k))\land(p \in N(g_i,k))\}$. Adding
-$\frac{1}{2}k$ reciprocal nearest neighbors of each element in the ranklist
+$\frac{1}{2}k$ reciprocal nearest neighbors of each element in the rank-list
$R(p,k)$, it is possible to form a more reliable set
$R^*(p,k) \longleftarrow R(p,k) \cup R(q,\frac{1}{2}k)$ that aims to overcome
the problem of query and gallery images being affected by factors such
@@ -177,7 +188,7 @@ from the $k_2$ neighbors. The dimension k of the *$R^*$* set will instead
be defined as $k_1$: $R^*(g_i,k_1)$.
The distances obtained are then mixed, obtaining a final distance $d^*(p,g_i)$ that is used to obtain the
-improved ranklist: $d^*(p,g_i)=(1-\lambda)d_J(p,g_i)+\lambda d(p,g_i)$.
+improved rank-list: $d^*(p,g_i)=(1-\lambda)d_J(p,g_i)+\lambda d(p,g_i)$.
The aim is to learn optimal values for $k_1,k_2$ and $\lambda$ in the training set that improve top1 identification accuracy.
This is done through a simple multi-direction search algorithm followed by exhaustive search to estimate
@@ -211,16 +222,16 @@ training are close to the ones for the local maximum of gallery and query.
Re-ranking achieves better results than the other baseline methods analyzed both as top $k$
accuracy and mean average precision.
-It is also necessary to estimate how precise the ranklist generated is.
+It is also necessary to estimate how precise the rank-list generated is.
For this reason an additional method of evaluation is introduced: mAP. See reference @mAP.
-It is possible to see in figure \ref{fig:ranklist2} how the ranklist generated for the same five queries of figure \ref{fig:eucrank}
+It is possible to see in figure \ref{fig:ranklist2} how the rank-list generated for the same five queries of figure \ref{fig:eucrank}
has improved for the fifth query. The mAP improves from 47.2% to 61.7%.
\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/ranklist.png}
-\caption{Top 10 Ranklist (improved method) generated for 5 query images}
+\caption{Top 10 Rank-list (improved method) generated for 5 query images}
\label{fig:ranklist2}
\end{center}
\end{figure}