From 410b6adba3cf266fbbe6455caae9c33712b1b8c5 Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Mon, 10 Dec 2018 20:15:51 +0000 Subject: Add todo sections --- report2/paper.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/report2/paper.md b/report2/paper.md index cf8221a..98915e8 100755 --- a/report2/paper.md +++ b/report2/paper.md @@ -75,8 +75,11 @@ identification is shown in red. \end{center} \end{figure} -Normalization of the feature vectors does not improve accuracy results of the -baseline as it can be seen in figure \ref{fig:baselineacc}. ###EXPLAIN WHY +Magnitude normalization of the feature vectors does not improve +accuracy results of the baseline as it can be seen in figure \ref{fig:baselineacc}. +This is due to the fact that the feature vectors appear scaled, releative to their +significance, for optimal distance classification, and as such normalising loses this +scaling by importance which has previously been introduced to the features. ## kMean Clustering @@ -89,8 +92,14 @@ classify the query image. This method did not bring any major improvement to the baseline, as it can be seen from figure \ref{fig:baselineacc}. It is noticeable how the number of clusters affects performance, showing better identification accuracy for a number of clusters away from -the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). ###EXPLAIN WHY +the local minimum achieved at 60 clusters (figure \ref{fig:kmeans}). This trend can likely be explained by the number of distance comparison's performed. +We would expect clustering with $k=1$ and $k=\textrm{label count}$ to have the same performance +the baseline approach without clustering, as we are performing the same number of comparisons. + +Clustering is a great method of reducing computation time. Assuming 39 clusters of 39 neighbours we would be performing only 78 distance computation for a gallery size of 1487, instead of the original 1487. This however comes at the cost of ignoring neighbours from other clusters which may be closer. Since clusters do not necessarily have the same number of datapoints inside them (sizes are uneven), we find that the lowest average number of comparison happens at around 60 clusters, which also appears to be the worst performing number of clusters. + +We find that for the query and gallery set clustering does not seem to improve identification accuracy, and consider it an additional baseline. \begin{figure} \begin{center} -- cgit v1.2.3-54-g00ecf