# Formulation of the Addressed Machine Learning Problem ## Probelm Definition The person re-identification problem presented in this paper requires matching pedestrian images from disjoint cameras by pedestrian detectors. This problem is challenging, as identities captured in photos are subject to various lighting, pose, blur, background and occlusion from various camera views. This report considers features extracted from the CUHK03 dataset, following a 50 layer Residual network (ResNet-50). Different distance metrics techniques can be used to perform person re-identification across the *disjoint* cameras. Features extracted from Neural Networks such as ResNet-50 are already highly processed. We therefore expect it to be extremely hard to further optimise the feature vectors for data separation, but may be able benefit from alternative neighbour matching algorithms that take into account the relative positions of the nearest neighbours with the probe and each other. ## Dataset - CUHK03 Summary The dataset CUHK03 contains 14096 pictures of people captured from two different cameras. The feature vectors used, extracted from a trained ResNet-50 model , contain 2048 features that are used for identification. The pictures represent 1467 different identities, each of which appears 7 to 10 times. Data is separated in training, query and gallery sets with `train_idx`, `query_idx` and `gallery_idx` respectively, where the training set has been used to develop the ResNet-50 model used for feature extraction. This procedure has allowed the evaluation of distance metric learning techniques on the query and gallery sets, with the knowledge that we are not comparing features produced by a neural network which was specifically (over-)fitted on them, as they were extracted based on the model derived from the training set. ## Nearest Neighbor rank-list Nearest Neighbor can be used to find gallery images with features close to those of a query image, predicting the class or identify of the query image as the same of its nearest neighbor(s), based on distance. The distance between images can be calculated through different distance metrics. The most commonly used is euclidean distance: $$ \textrm{NN}(x) = \operatorname*{argmin}_{i\in[m]} \|x-x_i\|. $$ We further consider Mahalanobis and Jaccard distance in this paper. # Baseline Evaluation To evaluate improvements brought by alternative distance learning metrics a baseline is established through generic single nearest neighbour identification as outlined above. Identification accuracies at top1, top5 and top10 are respectively 47%, 67% and 75% (figure \ref{fig:baselineacc}). The mAP is 47.2%. This is a *good* result and is a traditional classification and identification technique which is effective with good feature transformation. \begin{figure} \begin{center} \includegraphics[width=20em]{fig/baseline.pdf} \caption{Top k identification accuracy of baseline Nearest Neighbor} \label{fig:baselineacc} \end{center} \end{figure} Figure \ref{fig:eucrank} shows the rank-list generated through baseline NN for 5 query images (black). Correct identification is shown in green and incorrect identification is shown in red. \begin{figure} \begin{center} \includegraphics[width=22em]{fig/eucranklist.png} \caption{Top 10 rank-list generated for 5 query images} \label{fig:eucrank} \end{center} \end{figure} Magnitude normalization is a common technique, used to equalize feature importance. Applying magnitude normalization (scaling feature vectors to unit length) had a negative effect re-identification. Furthemore standartization by removing feature mean and deviation also had negative effect on performance as seen on figure \ref{fig:baselineacc}. This may be due to the fact that we are removing feature scaling that was introduced by the Neural Network, such that some of the features are more significant than others. By standartizing our features at this point, we remove such scaling and may be losing useful information. ## $k$-Means Clustering An additional baseline technique considered is that of *$k$-Means clustering*. Clustering methods may allow the reduction of computational complexity of the baseline NN, by forming clusters and performing a comparison between query image and clusters centers. The elements associated with the closest cluster center are then considered to perform NN and classify the query image. Clustering with $k$-Means, however, did not bring any major improvement to the baseline, as it can be seen from figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects performance, showing better identification accuracy for a number of clusters away from the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely be explained by the number of distance comparisons performed. We would expect clustering with $k=1$ and $k=\textrm{label count}$ to have the same performance of the baseline approach without clustering, as we are performing the same number of comparisons. We find that clustering is a great method of reducing computation time. Assuming 38 clusters of 38 neighbours we would be performing only 76 distance computations for a gallery size of 1467, instead of the original 1467. This however comes at the cost of ignoring neighbours from other clusters which may be closer. Since clusters do not necessarily have the same number of datapoints inside them (sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters, which also appears to be the worst performing number of clusters. Since we find that for the query and gallery set clustering does not improve identification accuracy, we consider it an additional baseline. \begin{figure} \begin{center} \includegraphics[width=17em]{fig/kmeanacc.pdf} \caption{Top 1 Identification accuracy varying $k$-Means cluster size} \label{fig:kmeans} \end{center} \end{figure} ## Mahalanobis Distance Mahalanobis distance is an extremely useful measure of distance between a point (such as a query image feature vector) and a distribution (such as the gallery), which accounts for how many standard deviations away the point is from the mean of the distribution. It is unitless and scale invariant, in a way standartising the data. We used Mahalanobis distance as an altenrative metric to Euclidean distance: $$ d_M(p,g_i) = (p-g_i)^TM(p-g_i). $$ We were not able to achieve significant improvements using Mahalanobis for original distance ranking compared to square euclidiaen metrics. When performed with the covariance matrix $M$ generated from the training set, reported accuracy is reduced to 38%. We also attempted to perform the same Mahalanobis metric on a reduced PCA featureset. PCA performed with the top 100 eigenvectors reduces the feature space while keeping 94% of the data variance. This allowed for significant execution time improvements due to the greatly reduced computation requierments for smaller featurespace, but nevertheless demonstrated no improvements over an euclidean metric. These results are likely due to the low covariance between features in the training set. This is evident when looking at the covariance matrix of the training data, and is also visible in figure \ref{fig:subspace}. This is likely the result of the feature transformations performed by the ResNet-50 convolution model from which features were extracted. A connection can also be made to the standardisation baseline previously evaluated and the reduction of re-identification accuracy associated with it. \begin{figure} \begin{center} \includegraphics[width=12em]{fig/cdist.pdf} \includegraphics[width=12em]{fig/train_subspace.pdf} \caption{Left:first two features of gallery(o) and query(x) data for 3 labels; Right:First two features of train data for three labels} \label{fig:subspace} \end{center} \end{figure} While we did not use Mahalanobis as a primary distance metric, it is possible to beneficially use the Mahalanobis metric on differently processed feature sets, together with the next investigated solution involving $k$-reciprocal re-ranking. # Suggested Improvement ## $k$-reciprocal Re-ranking Formulation The approach investigated for improving identification performance is based on $k$-reciprocal re-ranking. The following section summarizes the technique introduced in reference @rerank-paper. We define $N(p,k)$ as the top $k$ elements of the rank-list generated through NN, where $p$ is a query image. The k reciprocal rank-list, $R(p,k)$ is defined as the intersection $R(p,k)=\{g_i|(g_i \in N(p,k))\land(p \in N(g_i,k))\}$. Adding $\frac{1}{2}k$ reciprocal nearest neighbors of each element in the rank-list $R(p,k)$, it is possible to form a more reliable set $R^*(p,k) \longleftarrow R(p,k) \cup R(q,\frac{1}{2}k)$ that aims to overcome the problem of query and gallery images being affected by factors such as position, illumination and foreign objects. $R^*(p,k)$ is used to recalculate the distance between query and gallery images. Jaccard metric of the $k$-reciprocal sets is used to calculate the distance between $p$ and $g_i$ as: $$d_J(p,g_i)=1-\frac{|R^*(p,k)\cap R^*(g_i,k)|}{|R^*(p,k)\cup R^*(g_i,k)|}.$$ However, since the neighbors of the query $p$ are close to $g_i$ as well, they would be more likely to be identified as true positive. This implies the need of a more discriminative method, which is achieved encoding the $k$-reciprocal neighbors into an $N$-dimensional vector as a function of the original distance (in our case square euclidean $d(p,g_i) = \|p-g_i\|^2$) through the gaussian kernell: \begin{equation} \textit{V\textsubscript{p,g\textsubscript{i}}}= \begin{cases} e\textsuperscript{\textit{-d(p,g\textsubscript{i})}}, & \text{if}\ \textit{g\textsubscript{i}}\in \textit{R\textsuperscript{*}(p,k)} \\ 0, & \text{otherwise.} \end{cases} \end{equation} Through this transformation it is possible to reformulate the distance with the Jaccardian metric as: $$ d_J(p,g_i)=1-\frac{\sum\limits_{j=1}^N min(V_{p,g_j},V_{g_i,g_j})}{\sum\limits_{j=1}^N max(V_{p,g_j},V_{g_i,g_j})}. $$ It is then possible to perform a local query expansion using the g\textsubscript{i} neighbors of defined as: $$ V_p=\frac{1}{|N(p,k_2)|}\sum\limits_{g_i\in N(p,k_2)}V_{g_i}. $$ We refer to $k_2$ since it allows constricting the the size of the neighbors to prevent noise from the $k_2$ neighbors. The dimension k of the *$R^*$* set will instead be defined as $k_1$: $R^*(g_i,k_1)$. The distances obtained are then combined, obtaining a final distance $d^*(p,g_i)$ that is used to obtain the improved rank-list: $d^*(p,g_i)=(1-\lambda)d_J(p,g_i)+\lambda d(p,g_i)$. ## Optimisation The goal is to learn optimal values for $k_1,k_2$ and $\lambda$ using the training set, such that an improved Top-1 identification accuracy can be found. This is done using a multi-direction search algorithm to estimate $k_{1_{opt}}$ and $k_{2_{opt}}$ and an exhaustive search for $\lambda$ from $\lambda = 0$ (exclusively Jaccard distance) to $\lambda = 1$ (only original distance) in steps of 0.1. The results obtained through this approach suggest: $k_{1_{opt}}=9, k_{2_{opt}}=3, 0.1\leq\lambda_{opt}\leq 0.3$. To verify optimisation of $k_{1_{opt}}$, $k_{2_{opt}}$ heat plots were performed heat on figures \ref{fig:pqvals}, showing that the optimal values obtained from training are close to the ones for the local maximum of gallery and query. The $\lambda$ search is plotted on figure \ref{fig:lambda}. \begin{figure} \begin{center} \includegraphics[width=12em]{fig/pqvals.pdf} \includegraphics[width=12em]{fig/trainpqvals.pdf} \caption{Identification accuracy varying $k_1$ and $k_2$ (gallery-query left, train right) KL=0.3} \label{fig:pqvals} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=12em]{fig/lambda_acc.pdf} \includegraphics[width=12em]{fig/lambda_acc_tr.pdf} \caption{Top 1 Identification Accuracy with Re-rank varying lambda(gallery-query left, train right) $k_1$=9, $k_2$=3} \label{fig:lambda} \end{center} \end{figure} ## $k$-reciprocal Re-ranking Evaluation Re-ranking achieves better results than the other baseline methods comparing both top $k$ accuracy and mean average precision (mAP). It is also necessary to estimate how precise the rank-list generated is. For this reason an additional method of evaluation is introduced: mAP. See reference @mAP. The rank-list improvement over Euclidean NN on \ref{fig:eucrank} with re-ranking is shown visually in \ref{fig:ranklist2}. With re-ranking mAP improves from 47.2% to 61.7%. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/ranklist.png} \caption{Top 10 Rank-list (with re-rank) generated for 5 query images} \label{fig:ranklist2} \end{center} \end{figure} Figure \ref{fig:compare} shows a comparison between top $k$ identification accuracies obtained with the two methods. It is noticeable that the $k$-reciprocal re-ranking method significantly improves the results even for top-$1$, boosting identification accuracy from 47% to 56.5%. The difference between the top $k$ accuracies of the two methods gets smaller as we increase $k$. \begin{figure} \begin{center} \includegraphics[width=20em]{fig/comparison.pdf} \caption{Top K (@rank) Identification accuracy ($K_L=0.3$,$k_1=9$,$k_2=3$)} \label{fig:compare} \end{center} \end{figure} Figure \ref{fig:rerank} shows successful re-ranking as performed using the query and gallery sets. The $k$-reciprocal ranklist of the second original nearest neighbour, contains the target identity, and when the final distance is calculated as a function original and Jaccard distance, the matching identity becomes the closest neighbour. This happens with three cases of the identity, despite the image visually hard to classify due to the presence of another person in the photos and a completely different pose. \begin{figure} \begin{center} \includegraphics[width=24em]{fig/rerank.pdf} \caption{Re-rank example} \label{fig:rerank} \end{center} \end{figure} # Conclusions \begin{table}[] \centering \begin{tabular}{ll} \hline \textbf{Process} & \textbf{Complexity} \\ \hline Euclidean Distance Calculatoin & \textit{O(N\textsuperscript{2})} \\ Mahalanobis Distance Calculatoin & \textit{O(N\textsuperscript{3 (2.708)})} \\ Re-ranking with Jaccard & \textit{O(N\textsuperscript{2}logN)} \\ \hline \end{tabular} \label{tab:complexity} \end{table} Overall the re-ranking method gives a significant improvement to both top $k$ accuracy and mean average precision. The cost of this operation is an increase in computation time due to the change in complexity from the baseline, summarized in the table. # References