# Formulation of the Addressed Machine Learning Problem

## Probelm Definition

The person re-identification problem presented in this paper requires matching 
pedestrian images from disjoint cameras by pedestrian detectors. This problem is 
challenging, as identities captured in photos are subject to various lighting, pose, 
blur, background and occlusion from various camera views. This report considers 
features extracted from the CUHK03 dataset, following a 50 layer Residual network 
(ResNet50). Different distance metrics techniques can be used to 
perform person re-identification across *disjoint* cameras.

Features extracted from Neural Networks such as ResNet-50 are already highly processed 
and represent **features** extracted from the raw data. We therefore expect it to be extremely 
hard to further optimise the feature vectors for data separation, but we may be able benefit 
from alternative neighbour matching algorithms that take into account the 
relative positions of the nearest neighbours with the probe and each other.

## Dataset - CUHK03  Summary

The dataset CUHK03 contains 14096 pictures of people captured from two 
different cameras. The feature vectors used, extracted from a trained ResNet50 model
, contain 2048 features that are used for identification.

The pictures represent 1467 different identities, each of which appears 7 to 10
times. Data is separated in train, query and gallery sets with `train_idx`, 
`query_idx` and `gallery_idx` respectively, where the training set has been used 
to develop the ResNet50 model used for feature extraction. This procedure has 
allowed the evaluation of distance metric learning techniques on the query and 
gallery sets, with the knowledge that we are not comparing over-fitted features,
as they were extracted based on the model derived from the training set. 

## Nearest Neighbor rank-list

Nearest Neighbor aims to find the gallery image whose feature are the closest to
the ones of a query image, predicting the class of the query image as the same 
of its nearest neighbor(s). The distance between images can be calculated through
different distance metrics, however one of the most commonly used is euclidean
distance:

$$ \textrm{NN}(x) = \operatorname*{argmin}_{i\in[m]} \|x-x_i\|. $$

# Baseline Evaluation

To evaluate improvements brought by alternative distance learning metrics a baseline 
is established through generic single nearest neighbour identification as outlined above. 
Identification accuracies at top1, top5 and top10 are respectively 47%, 67% and 75%
(figure \ref{fig:baselineacc}). The mAP is 47.2%. This is a *good* result and is a traditional
classification and identification technique which is effective with good feature transformation.

\begin{figure}
\begin{center}
\includegraphics[width=20em]{fig/baseline.pdf}
\caption{Top k identification accuracy of baseline Nearest Neighbor}
\label{fig:baselineacc}
\end{center}
\end{figure}

Figure \ref{fig:eucrank} shows the rank-list generated through baseline NN for 
5 query images (black). Correct identification is shown in green and incorrect 
identification is shown in red.

\begin{figure}
\begin{center}
\includegraphics[width=22em]{fig/eucranklist.png}
\caption{Top 10 rank-list generated for 5 query images}
\label{fig:eucrank}
\end{center}
\end{figure}

Magnitude normalization is a common technique, used to equalize feature importance. 
Applying magnitude normalization (scaling feature vectors to unit length) had a negative 
effect re-identification. Furthemore standartization by removing feature mean and deviation 
also had negative effect on performance as seen on figure \ref{fig:baselineacc}. This may 
be due to the fact that we are removing feature scaling that was introduced by the Neural Network, 
such that some of the features are more significant than others. By standartizing our 
features at this point, we remove such scaling and may be losing useful information.

## kMeans Clustering

An addition considered for the baseline is *kMeans clustering*. In theory this 
method allows to reduce computational complexity of the baseline NN by forming clusters 
and performing a comparison between query image and clusters centers. The elements
associated with the closest cluster center are then considered to perform NN and
classify the query image. 

This method did not bring any major improvement to the baseline, as it can be seen from
figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects
performance, showing better identification accuracy for a number of clusters away from
the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely 
be explained by the number of distance comparisons performed.

We would expect clustering with $k=1$ and $k=\textrm{label count}$ to have the same performance
of the baseline approach without clustering, as we are performing the same number of comparisons.

Clustering is a great method of reducing computation time. Assuming 38 clusters of 38 neighbours 
we would be performing only 76 distance computations for a gallery size of 1467, instead of the 
original 1467. This however comes at the cost of ignoring neighbours from other clusters which may 
be closer. Since clusters do not necessarily have the same number of datapoints inside them 
(sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters, 
which also appears to be the worst performing number of clusters. 

We find that for the query and gallery set clustering does not seem to 
improve identification accuracy, and consider it an additional baseline.

\begin{figure}
\begin{center}
\includegraphics[width=17em]{fig/kmeanacc.pdf}
\caption{Top 1 Identification accuracy varying kmeans cluster size}
\label{fig:kmeans}
\end{center}
\end{figure}

## Mahalanobis Distance

We were not able to achieve significant improvements using Mahalanobis for 
original distance ranking compared to square euclidiaen metrics. 

The Mahalanobis distance metric was used to create the ranklist as an alternative to euclidean distance:

$$ d_M(p,g_i) = (p-g_i)^TM(p-g_i). $$

When performing Mahalanobis with the covariance matrix $M$ generated from the training set, reported accuracy is reduced to 38%.

We also attempted to perform the same Mahalanobis metric on a reduced PCA featureset. PCA performed with the top 100 eigenvectors
reduces the feature space while keeping 94% of the data variance. This allowed for significant execution 
time improvements due to the greatly reduced computation requierments for smaller featurespace, but nevertheless demonstrated no
improvements over an euclidean metric.

These results are likely due to the extremely low covariance of features in the training set. 
This is evident when looking at the covariance matrix of the training data, and is also 
visible in figure \ref{fig:subspace}. This is likely the result of the feature 
transformations performed by the ResNet-50 convolution model from which features were extracted.

\begin{figure}
\begin{center}
\includegraphics[width=12em]{fig/cdist.pdf}
\includegraphics[width=12em]{fig/train_subspace.pdf}
\caption{Left:first two features of gallery(o) and query(x) data for 3 labels; Right:First two features of train data for three labels}
\label{fig:subspace}
\end{center}
\end{figure}

While we did not use Mahalanobis as a primary distance metric, it is possible to use the Mahalanobis metric, 
together with the next investigated solution involving $k$-reciprocal re-ranking.

# Suggested Improvement

## $k$-reciprocal Re-ranking Formulation

The approach addressed to improve the identification performance is based on
$k$-reciprocal re-ranking. The following section summarizes the idea behind
the method illustrated in reference @rerank-paper.

We define $N(p,k)$ as the top $k$ elements of the rank-list generated through NN,
where $p$ is a query image. The k reciprocal rank-list, $R(p,k)$ is defined as the
intersection $R(p,k)=\{g_i|(g_i \in N(p,k))\land(p \in N(g_i,k))\}$. Adding 
$\frac{1}{2}k$ reciprocal nearest neighbors of each element in the rank-list
$R(p,k)$, it is possible to form a more reliable set 
$R^*(p,k) \longleftarrow R(p,k) \cup R(q,\frac{1}{2}k)$ that aims to overcome 
the problem of query and gallery images being affected by factors such
as position, illumination and foreign objects. $R^*(p,k)$ is used to 
recalculate the distance between query and gallery images.

Jaccard metric of the $k$-reciprocal sets is used to calculate the distance 
between $p$ and $g_i$ as: $$d_J(p,g_i)=1-\frac{|R^*(p,k)\cap R^*(g_i,k)|}{|R^*(p,k)\cup R^*(g_i,k)|}.$$

However, since the neighbors of the query $p$ are close to $g_i$ as well, 
they would be more likely to be identified as true positive. This implies
the need of a more discriminative method, which is achieved
encoding the $k$-reciprocal neighbors into an $N$-dimensional vector as a function
of the original distance (in our case square euclidean $d(p,g_i) = \|p-g_i\|^2$) 
through the gaussian kernell:

\begin{equation}
\textit{V\textsubscript{p,g\textsubscript{i}}}=
\begin{cases}
e\textsuperscript{\textit{-d(p,g\textsubscript{i})}}, & \text{if}\ \textit{g\textsubscript{i}}\in \textit{R\textsuperscript{*}(p,k)} \\
0, & \text{otherwise.}
\end{cases}
\end{equation}

Through this transformation it is possible to reformulate the distance obtained 
through Jaccardian metric as: 

$$ d_J(p,g_i)=1-\frac{\sum\limits_{j=1}^N min(V_{p,g_j},V_{g_i,g_j})}{\sum\limits_{j=1}^N max(V_{p,g_j},V_{g_i,g_j})}. $$

It is then possible to perform a local query expansion using the g\textsubscript{i} neighbors of
defined as:
$$ V_p=\frac{1}{|N(p,k_2)|}\sum\limits_{g_i\in N(p,k_2)}V_{g_i}. $$ 
We refer to $k_2$ since we limit the size of the nighbors to prevent noise 
from the $k_2$ neighbors. The dimension k of the *$R^*$* set will instead 
be defined as $k_1$: $R^*(g_i,k_1)$.

The distances obtained are then mixed, obtaining a final distance $d^*(p,g_i)$ that is used to obtain the
improved rank-list: $d^*(p,g_i)=(1-\lambda)d_J(p,g_i)+\lambda d(p,g_i)$.

## Optimisation
The aim is to learn optimal values for $k_1,k_2$ and $\lambda$ in the training set that improve top1 identification accuracy.
This is done through a simple multi-direction search algorithm followed by exhaustive search to estimate 
$k_{1_{opt}}$ and $k_{2_{opt}}$ for eleven values of $\lambda$ from zero (only Jaccard distance) to one (only original distance)
in steps of 0.1. The results obtained through this approach suggest: $k_{1_{opt}}=9, k_{2_{opt}}=3, 0.1\leq\lambda_{opt}\leq 0.3$.

It is possible to verify that the optimisation of $k_{1_{opt}}$, $k_{2_{opt}}$ and $\lambda$
has been successful. Figures \ref{fig:pqvals} and \ref{fig:lambda} show that the optimal values obtained from 
training are close to the ones for the local maximum of gallery and query.

\begin{figure}
\begin{center}
\includegraphics[width=12em]{fig/pqvals.pdf}
\includegraphics[width=12em]{fig/trainpqvals.pdf}
\caption{Identification accuracy varying K1 and K2 (gallery-query left, train right) KL=0.3}
\label{fig:pqvals}
\end{center}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=12em]{fig/lambda_acc.pdf}
\includegraphics[width=12em]{fig/lambda_acc_tr.pdf}
\caption{Top 1 Identification Accuracy with Re-rank varying lambda(gallery-query left, train right) K1=9, K2=3}
\label{fig:lambda}
\end{center}
\end{figure}

## $k$-reciprocal Re-ranking Evaluation 

Re-ranking achieves better results than the other baseline methods analyzed both as top $k$
accuracy and mean average precision. 
It is also necessary to estimate how precise the rank-list generated is.
For this reason an additional method of evaluation is introduced: mAP. See reference @mAP.

It is possible to see in figure \ref{fig:ranklist2} how the rank-list generated for the same five queries of figure \ref{fig:eucrank} 
has improved for the fifth query. The mAP improves from 47.2% to 61.7%.

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/ranklist.png}
\caption{Top 10 Rank-list (improved method) generated for 5 query images}
\label{fig:ranklist2}
\end{center}
\end{figure}

Figure \ref{fig:compare} shows a comparison between top $k$ identification accuracies
obtained with the two methods. It is noticeable that the $k$-reciprocal re-ranking method significantly
improves the results even for top$1$, boosting identification accuracy from 47% to 56.5%.
The difference between the top $k$ accuracies of the two methods gets smaller as we increase $k$.

\begin{figure}
\begin{center}
\includegraphics[width=20em]{fig/comparison.pdf}
\caption{Top K (@rank) Identification accuracy (KL=0.3,K1=9,K2=3)}
\label{fig:compare}
\end{center}
\end{figure}

The improved results due to $k$-reciprocal re-ranking may be explained by considering that re-ranks based on second order neighbours,
that is, the neighbours of the neighbours. For neighbours which display identifiable features, such as a backpack or binder that is
not visible in the query but visible in a close neighbour, the reranking algorithm is able to infer **that the strong relationship based on this newly introduced
feature such as a backpack or folder by the neighbour, uniquely identify other identities in the gallery with the same feature, and moving them up the rankinlist 
as a result despite the identifiable feature being hidden in the query**. An example of this can be seen in figure \ref{fig:rerank}.

\begin{figure}
\begin{center}
\includegraphics[width=24em]{fig/rerank.pdf}
\caption{Rerank example}
\label{fig:rerank}
\end{center}
\end{figure}


# Conclusions

Overall the reranking method gives a significant improvement to both top $k$ accuracy and mean average precision.
The cost of this operation is an increase in computation time due to the change in complexity from
the baseline, summarized in table \ref{tab:complexity}.

\begin{table}[]
\centering
\begin{tabular}{ll}
\hline
\textbf{Process}                 & \textbf{Complexity}  \\ \hline
Rerank - Distances calculation   & \textit{O(N\textsuperscript{2})}     \\ 
Rerank - Ranking                 & \textit{O(N\textsuperscript{2}logN)} \\ 
Baseline - Distances calculation & \textit{O(N\textsuperscript{2})}     \\ 
Baseline - Ranking               & \textit{O(N\textsuperscript{2})}     \\  
\end{tabular}
\caption{Complexity evaluation}
\label{tab:complexity}
\end{table}


# References