# Codebooks ## K-means codebook A common technique for codebook generation involves utilising K-means clustering on a sample of the image descriptors. In this way descriptors may be mapped to *visual* words which lend themselves to binning and therefore the creation of bag-of-words histograms for the use of classification. In this courseworok 100-thousand descriptors have been extracted through SIFT to build the visual vocabulary from the Caltech dataset. ## Vocabulary size The number of clusters or the number of centroids determines the vocabulary size when creating the codebook with the K-means the method. Each descriptor is mapped to the nearest centroid, and each descriptor belonging to that cluster is mapped to the same *visual word*. This allows similar descriptors to be mapped to the same word, allowing for comparison through bag-of-words techniques. ## Bag-of-words histogram quantisation of descriptor vectors An example histograms for training and testing images is shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-means centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}. A small vocabulary size turns out to misrepresent the information contained in the different patches, resulting in poor classification accuracy. When the vocabulary size gets too big (too many k-mean centroids), the result is instead overfitting. Figure \ref{fig:km_vocsize} shows a plateau after 60 cluster centers. The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method we tried is applying PCA to the descriptors vecotrs to improve time performance. However in this case the descriptors' size is relatively small, and for such reason we opted to avoid PCA for further training. K-means is a process that converges to local optima and heavilly depends on the initialization values of the centroids. Initializing k-means is an expensive process, based on sequential attempts of centroids placement. Running for multiple instances significantly affects the computation process, leading to a linear increase in execution time. Attempting centroid initialization more than once didn't cause significant improvements in terms of accuracy for the data analysed in this coursework, only leading to an increase in execution time. \begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/trainhist.pdf} \includegraphics[width=12em]{fig/testhist.pdf} \caption{Bag-of-words histograms; Training left, Testing right} \label{fig:histo_tr} \end{center} \end{figure} # RF classifier ## Hyperparameters tuning Figure \ref{fig:km-tree-param} shows the effect of tree depth and number of trees for K-means 100 cluster centers. \begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/error_depth_kmean100.pdf} \includegraphics[width=12em]{fig/trees_kmean.pdf} \caption{Classification error varying trees depth(left) and numbers of trees(right)} \label{fig:km-tree-param} \end{center} \end{figure} Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then select the best feature of the selected ones to perform the split on, based on some criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary in \ref{fig:kmeanrandom}. \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/new_kmean_random.pdf} \caption{newkmeanrandom} \label{fig:kmeanrandom} \end{center} \end{figure} Changing the randomness parameter had no significant effect on execution time. This can partly be explained by the increased required tree depth to purify the training set. ## Weak Learner comparison In figure \ref{fig:2pt} it is possible to notice an improvement in recognition accuracy by 1%, with the two pixels test, achieving better results than the axis-aligned counterpart. The two-pixels test however brings a slight deacrease in time performance which has been measured to be on average 3 seconds more. This is due to the complexity added by the two-pixels test, since it adds one dimension to the computation. \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/2pixels_kmean.pdf} \caption{K-means classification accuracy changing the type of weak learners} \label{fig:2pt} \end{center} \end{figure} ## Impact of the vocabulary size on classification accuracy. \begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/kmeans_vocsize.pdf} \includegraphics[width=12em]{fig/time_kmeans.pdf} \caption{Effect of vocabulary size; classification error left, time right} \label{fig:km_vocsize} \end{center} \end{figure} ## Confusion matrix for case XXX, with examples of failure and success \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/e100k256d5_cm.pdf} \caption{e100k256d5cm K-means Confusion Matrix} \label{fig:km_cm} \end{center} \end{figure} \begin{figure}[H] \begin{center} \includegraphics[width=10em]{fig/success_km.pdf} \includegraphics[width=10em]{fig/fail_km.pdf} \caption{K-means: Success on the left; Failure on the right} \label{fig:km_succ} \end{center} \end{figure} # RF codebook An alternative to codebook creation via K-means involves using an ensemble of totally random trees. We code each decriptor according to which leaf of each tree in the ensemble it is sorted. This effectively performs and unsupervised transformation of our dataset to a high-dimensional spare representation. The dimension of the vocubulary size is determined by the number of leaves in each random tree and the ensemble size. \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/256t1_e200D5_cm.pdf} \caption{Part 3 confusion matrix e100k256d5cm} \label{fig:p3_cm} \end{center} \end{figure} \begin{figure}[H] \begin{center} \includegraphics[width=10em]{fig/success_3.pdf} \includegraphics[width=10em]{fig/fail_3.pdf} \caption{Part3: Success on the left; Failure on the right} \label{fig:p3_succ} \end{center} \end{figure} \begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/error_depth_p3.pdf} \includegraphics[width=12em]{fig/trees_p3.pdf} \caption{Classification error varying trees depth(left) and numbers of trees(right)} \label{fig:p3_trees} \end{center} \end{figure} \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/p3_rand.pdf} \caption{Effect of randomness parameter on classification error} \label{fig:p3_rand} \end{center} \end{figure} \begin{figure}[H] \begin{center} \includegraphics[width=12em]{fig/p3_vocsize.pdf} \includegraphics[width=12em]{fig/p3_time.pdf} \caption{Effect of vocabulary size; classification error left, time right} \label{fig:p3_voc} \end{center} \end{figure} \begin{figure}[H] \begin{center} \includegraphics[width=18em]{fig/p3_colormap.pdf} \caption{Varying leaves and estimators: effect on accuracy} \label{fig:p3_colormap} \end{center} \end{figure} # Comparison of methods and conclusions # References