# Forulation of the Addresssed Machine Learning Problem ## Probelm Definition The person re-identification problem presented in this paper requires matching pedestrian images from disjoint cameras by pedestrian detectors. This problem is challenging, as identities captured in photsos are subject to various lighting, pose, blur, background and oclusion from various camera views. This report considers features extracted from the CUHK03 dataset, following a 50 layer Residual network (Resnet50). This paper considers distance metrics techniques which can be used to perform person re-identification across *disjoint* cameras, using these features. ## Dataset - CUHK03 Summary The dataset CUHK03 contains 14096 pictures of people captured from two different cameras. The feature vectors used, extracted from a trained ResNet50 model , contain 2048 features that are used for identification. The pictures represent 1467 different identities, each of which appears 9 to 10 times. Data is seperated in train, query and gallery sets with `train_idx`, `query_idx` and `gallery_idx` respectively, where the training set has been used to develop the ResNet50 model used for feature extraction. This procedure has allowed the evaluation of distance metric learning techniques on the query and gallery sets, with the knowledge that we are not comparing overfitted features, as they were extracted based on the model derived from the training set. ## Probelm to solve The problem to solve is to create a ranklist for each image of the query set by finding the nearest neighbor(s) within a gallery set. However gallery images with the same label and taken from the same camera as the query image should not be considered when forming the ranklist. ## Nearest Neighbor ranklist Nearest Neighbor aims to find the gallery image whose feature are the closest to the ones of a query image, predicting the class of the query image as the same of its nearest neighbor(s). The distance between images can be calculated through different distance metrics, however one of the most commonly used is euclidean distance: $$ \textrm{NN}(x) = \operatorname*{argmin}_{i\in[m]} \|x-x_i\| $$ Alternative distance metrics exist such as jaccardian and mahalanobis, which can be used as an alternative to euclidiean distance. # Baseline Evaluation To evaluate improvements brought by alternative distance learning metrics a baseline is established through nearest neighbour identification as previously described. Identification accuracies at top1, top5 and top10 are respectively 47%, 67% and 75% (figure \ref{fig:baselineacc}). The mAP for a ranklist of size 10 is 33.3%. \begin{figure} \begin{center} \includegraphics[width=20em]{fig/baseline.pdf} \caption{Recognition accuracy of baseline Nearest Neighbor @rank k} \label{fig:baselineacc} \end{center} \end{figure} Figure \ref{fig:eucrank} shows the ranklist generated through baseline NN for 5 query images(black). Correct identification is shown in green and incorrect identification is shown in red. \begin{figure} \begin{center} \includegraphics[width=22em]{fig/eucranklist.png} \caption{Ranklist @rank10 generated for 5 query images} \label{fig:eucrank} \end{center} \end{figure} Magnitude normalization of the feature vectors does not improve accuracy results of the baseline as it can be seen in figure \ref{fig:baselineacc}. This is due to the fact that the feature vectors appear scaled, releative to their significance, for optimal distance classification, and as such normalising loses this scaling by importance which has previously been introduced to the features. ## kMeans Clustering An addition considered for the baseline is *kMeans clustering*. In theory this method allows to reduce computational complexity of the baseline NN by forming clusters and performing a comparison between query image and clusters centers. The elements associated with the closest cluster center are then considered to perform NN and classify the query image. This method did not bring any major improvement to the baseline, as it can be seen from figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects performance, showing better identification accuracy for a number of clusters away from the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely be explained by the number of distance comparison's performed. We would expect clustering with $k=1$ and $k=\textrm{label count}$ to have the same performance the baseline approach without clustering, as we are performing the same number of comparisons. Clustering is a great method of reducing computation time. Assuming 39 clusters of 39 neighbours we would be performing only 78 distance computation for a gallery size of 1487, instead of the original 1487. This however comes at the cost of ignoring neighbours from other clusters which may be closer. Since clusters do not necessarily have the same number of datapoints inside them (sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters, which also appears to be the worst performing number of clusters. We find that for the query and gallery set clustering does not seem to improve identification accuracy, and consider it an additional baseline. \begin{figure} \begin{center} \includegraphics[width=17em]{fig/kmeanacc.pdf} \caption{Top 1 Identification accuracy varying kmeans cluster size} \label{fig:kmeans} \end{center} \end{figure} # Suggested Improvement ## k-reciprocal Reranking Formulation The approach addressed to improve the identification performance is based on k-reciprocal reranking. The following section summarizes the idea behind the method illustrated in **REFERENCE PAPER**. We define $N(p,k)$ as the top k elements of the ranklist generated through NN, where p is a query image. The k reciprocal ranklist, $R(p,k)$ is defined as the intersection $R(p,k)=\{g_i|(g_i \in N(p,k))\land(p \in N(g_i,k))\}$. Adding $\frac{1}{2}k$ reciprocal nearest neighbors of each element in the ranklist $R(p,k)$, it is possible to form a more reliable set $R^*(p,k) \longleftarrow R(p,k) \cup R(q,\frac{1}{2}k)$ that aims to overcome the problem of query and gallery images being affected by factors such as position, illumination and foreign objects. $R^*(p,k)$ is used to recalculate the distance between query and gallery images. Jaccard metric of the k-reciprocal sets is used to calculate the distance between p and $g_i$ as: $$d_J(p,g_i)=1-\frac{|R^*(p,k)\cap R^*(g_i,k)|}{|R^*(p,k)\cup R^*(g_i,k)|}$$. However, since the neighbors of the query p are close to $g_i$ as well, they would be more likely to be identified as true positive. This implies the need of a more discriminative method, which is achieved encoding the k-reciprocal neighbors into an N-dimensional vector as a function of the original distance (in our case square euclidean $d(p,g_i) = \|p-g_i\|^2$) through the gaussian kernell: \begin{equation} \textit{V\textsubscript{p,g\textsubscript{i}}}= \begin{cases} e\textsuperscript{\textit{-d(p,g\textsubscript{i})}}, & \text{if}\ \textit{g\textsubscript{i}}\in \textit{R\textsuperscript{*}(p,k)} \\ 0, & \text{otherwise.} \end{cases} \end{equation} Through this transformation it is possible to reformulate the distance obtained through Jaccardian metric as: $$ d_J(p,g_i)=1-\frac{\sum\limits_{j=1}^N min(V_{p,g_j},V_{g_i,g_j})}{\sum\limits_{j=1}^N max(V_{p,g_j},V_{g_i,g_j})} $$ It is then possible to perform a local query expansion using the g\textsubscript{i} neighbors of defined as $V_p=\frac{1}{|N(p,k_2)|}\sum\limits_{g_i\in N(p,k_2)}V_{g_i}$. We refer to $k_2$ since we limit the size of the nighbors to prevent noise from the $k_2$ neighbors. The dimension k of the *$R^*$* set will instead be defined as $k_1$:$R^*(g_i,k_1)$. The distances obtained are then mixed, obtaining a final distance $d^*(p,g_i)$ that is used to obtain the improved ranklist: $d^*(p,g_i)=(1-\lambda)d_J(p,g_i)+\lambda d(p,g_i)$. The aim is to learn optimal values for $k_1,k_2$ and $\lambda$ in the training set that improve top1 identification accuracy. This is done through a simple **GRADIENT DESCENT** algorithm followed by exhaustive search to estimate $k_{1_{opt}}$ and $k_{2_{opt}}$ for eleven values of $\lambda$ from zero(only Jaccard distance) to one(only original distance) in steps of 0.1. The results obtained through this approach suggest: $k_{1_{opt}}=9, k_{2_{opt}}=3, 0.1\leq\lambda_{opt}\leq 0.3$. ## k-reciprocal Reranking Formulation \begin{figure} \begin{center} \includegraphics[width=24em]{fig/ranklist.png} \caption{Ranklist (improved method) @rank10 generated for 5 query images} \label{fig:ranklist2} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=20em]{fig/comparison.pdf} \caption{Comparison of recognition accuracy @rank k (KL=0.3,K1=9,K2=3)} \label{fig:compare} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=12em]{fig/pqvals.pdf} \includegraphics[width=12em]{fig/trainpqvals.pdf} \caption{Identification accuracy varying K1 and K2 (gallery-query left, train right)} \label{fig:pqvals} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=12em]{fig/lambda_acc.pdf} \includegraphics[width=12em]{fig/lambda_acc_tr.pdf} \caption{Top 1 Identification Accuracy with Rerank varying lambda(gallery-query left, train right)} \label{fig:lambda} \end{center} \end{figure} # Comment on Mahalnobis Distance as a metric We were not able to achieve significant improvements using mahalanobis for original distance ranking compared to square euclidiaen metrics. Results can be observed using the `-m|--mahalanobis` when running evalution with the repository complimenting this paper. COMMENT ON VARIANCE AND MAHALANOBIS RESULTS # Conclusion # References # Appendix \begin{figure} \begin{center} \includegraphics[width=17em]{fig/cdist.pdf} \caption{First two features of gallery(o) and query(x) feature data} \label{fig:subspace} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=17em]{fig/clusteracc.pdf} \caption{Top k identification accuracy for cluster count} \label{fig:clustk} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=17em]{fig/jaccard.pdf} \caption{Explained Jaccard} \label{fig:jaccard} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=17em]{fig/mahalanobis.pdf} \caption{Explained Mahalanobis} \label{fig:mahalanobis} \end{center} \end{figure}