From b9ba9a17236c666dcb2119612e0783be95d2b465 Mon Sep 17 00:00:00 2001
From: Vasil Zlatanov <v@skozl.com>
Date: Tue, 12 Feb 2019 20:40:10 +0000
Subject: Fix all caption text

---
 report/paper.md | 50 +++++++++++++++++++++++---------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/report/paper.md b/report/paper.md
index 4ab5924..112bd6e 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -16,6 +16,16 @@ The number of clusters or the number of centroids determines the vocabulary size
 
 An example histograms for training and testing images is shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. The histograms of the same class appear to have comparable magnitudes for their respective keywords, demonstrating they had a similar number of descriptors which mapped to each of the clusters. The effect of the vocubalary size (as determined by the number of K-means centroids) on the classificaiton accuracy is shown in figure \ref{fig:km_vocsize}. A small vocabulary size tends to misrepresent the information contained in the different patches, resulting in poor classification accuracy. Conversly a large vocabulary size (many K-mean centroids), may display overfitting. In our tests, we observe a plateau after a cluster count of 60 on figure \ref{fig:km_vocsize}.
 
+\begin{figure}[H]
+\begin{center}
+\includegraphics[width=12em]{fig/kmeans_vocsize.pdf}
+\includegraphics[width=12em]{fig/time_kmeans.pdf}
+\caption{Effect of vocabulary size; classification error (left) and time (right)}
+\label{fig:km_vocsize}
+\end{center}
+\end{figure}
+
+
 The time complexity of quantisation with a K-means codebooks is $O(DNK)$, where N is the number of entities to be clustered (descriptors), D is the dimension (of the descriptors) and K is the cluster count [@km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids (a random selection of 100 thousand descriptors). An alternative method we tried is applying PCA to the descriptors vectors to improve time performance. However, the descriptor dimension of 128 is relatiely small and as such we found PCA to be unnecessary.
 
 K-means is a process that converges to local optima and heavily depends on the initialization values of the centroids.
@@ -25,40 +35,39 @@ Initializing K-means is an expensive process, based on sequential attempts of ce
 \begin{center}
 \includegraphics[width=12em]{fig/trainhist.pdf}
 \includegraphics[width=12em]{fig/testhist.pdf}
-\caption{Bag-of-words histograms; Training left, Testing right}
+\caption{Bag-of-words histograms; Training (left), Testing (right)}
 \label{fig:histo_tr}
 \end{center}
 \end{figure}
 
 # RF classifier 
 
-We use a Random Forest Classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees. Random Forest classifier performance depends on the ensemble size, tree depth, randomness and weak learner used.
+We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees. Random forest classifier performance depends on the ensemble size, tree depth, randomness and weak learner used.
 
 ## Hyperparameters tuning
 
-Figure \ref{fig:km-tree-param} shows the effect of tree depth and number of trees
-for K-means 100 cluster centers.
+Figure \ref{fig:km-tree-param} shows the effect of tree depth and number of trees, when classifying a bag-of-words created by K-means with 100 cluster centers.
 
 \begin{figure}[H]
 \begin{center}
 \includegraphics[width=12em]{fig/error_depth_kmean100.pdf}
 \includegraphics[width=12em]{fig/trees_kmean.pdf}
-\caption{Classification error varying trees depth(left) and numbers of trees(right)}
+\caption{Classification error varying tree depth (left) and forest size (right)}
 \label{fig:km-tree-param}
 \end{center}
 \end{figure}
 
-Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then chose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary in figure \ref{fig:kmeanrandom}. The results in the figure use a forest size of 100 as we found that this estimatator count the improvement for $\sqrt{n}$ performance gains tend to plateau.
+Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then chose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary in figure \ref{fig:kmeanrandom}. The results in the figure \ref{fig:kmeanrandom} use a forest size of 100 as we infered that this is the estimatator count for which performance gains tend to plateau (when selecting $\sqrt{n}$ random features).
 
 \begin{figure}[H]
 \begin{center}
 \includegraphics[width=18em]{fig/new_kmean_random.pdf}
-\caption{newkmeanrandom}
+\caption{Classification error for different number of random features}
 \label{fig:kmeanrandom}
 \end{center}
 \end{figure}
 
-Changing the randomness parameter had no significant effect on execution time. This can partly be explained by the increased required tree depth to purify the training set.
+Changing the randomness parameter had no significant effect on execution time. This may be attributed to increased required tree depth to purify the training set.
 
 ## Weak Learner comparison
 
@@ -75,23 +84,10 @@ more. This is due to the complexity added by the two-pixels test, since it adds
 \end{center}
 \end{figure}
 
-## Impact of RF vocabulary size on classification accuracy. 
-
-\begin{figure}[H]
-\begin{center}
-\includegraphics[width=12em]{fig/kmeans_vocsize.pdf}
-\includegraphics[width=12em]{fig/time_kmeans.pdf}
-\caption{Effect of vocabulary size; classification error left, time right}
-\label{fig:km_vocsize}
-\end{center}
-\end{figure}
-
-## Confusion matrix for case XXX, with examples of failure and success 
-
 \begin{figure}[H]
 \begin{center}
 \includegraphics[width=18em]{fig/e100k256d5_cm.pdf}
-\caption{e100k256d5cm K-means Confusion Matrix}
+\caption{Confusion Matrix: K=256, ClassifierForestSize=100, Depth=5}
 \label{fig:km_cm}
 \end{center}
 \end{figure}
@@ -100,7 +96,7 @@ more. This is due to the complexity added by the two-pixels test, since it adds
 \begin{center}
 \includegraphics[width=10em]{fig/success_km.pdf}
 \includegraphics[width=10em]{fig/fail_km.pdf}
-\caption{K-means: Success on the left; Failure on the right}
+\caption{K-means + RF Classifier: Success (left); Failure (right)}
 \label{fig:km_succ}
 \end{center}
 \end{figure}
@@ -113,7 +109,7 @@ which is $O(\sqrt{D} N \log K)$ compared to $O(DNK)$ for K-means. Codebook mappi
 \begin{figure}[H]
 \begin{center}
 \includegraphics[width=18em]{fig/256t1_e200D5_cm.pdf}
-\caption{Part 3 confusion matrix e100k256d5cm}
+\caption{Confusion Matrix: CodeBookForestSize=256; ClassifierForestSize=200; Depth=5}
 \label{fig:p3_cm}
 \end{center}
 \end{figure}
@@ -122,7 +118,7 @@ which is $O(\sqrt{D} N \log K)$ compared to $O(DNK)$ for K-means. Codebook mappi
 \begin{center}
 \includegraphics[width=10em]{fig/success_3.pdf}
 \includegraphics[width=10em]{fig/fail_3.pdf}
-\caption{Part3: Success on the left; Failure on the right}
+\caption{Part3: Success (left) and Failure (right)}
 \label{fig:p3_succ}
 \end{center}
 \end{figure}
@@ -131,7 +127,7 @@ which is $O(\sqrt{D} N \log K)$ compared to $O(DNK)$ for K-means. Codebook mappi
 \begin{center}
 \includegraphics[width=12em]{fig/error_depth_p3.pdf}
 \includegraphics[width=12em]{fig/trees_p3.pdf}
-\caption{Classification error varying trees depth(left) and numbers of trees(right)}
+\caption{Classification error varying trees depth (left) and numbers of trees (right)}
 \label{fig:p3_trees}
 \end{center}
 \end{figure}
@@ -148,7 +144,7 @@ which is $O(\sqrt{D} N \log K)$ compared to $O(DNK)$ for K-means. Codebook mappi
 \begin{center}
 \includegraphics[width=12em]{fig/p3_vocsize.pdf}
 \includegraphics[width=12em]{fig/p3_time.pdf}
-\caption{Effect of vocabulary size; classification error left, time right}
+\caption{Effect of vocabulary size; classification error (left) and time (right)}
 \label{fig:p3_voc}
 \end{center}
 \end{figure}
-- 
cgit v1.2.3-70-g09d2


From 28e951c4e7c590cfeded709539a59cfae8519e81 Mon Sep 17 00:00:00 2001
From: Vasil Zlatanov <v@skozl.com>
Date: Tue, 12 Feb 2019 20:45:18 +0000
Subject: Slight changes to conclusion

---
 report/paper.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/report/paper.md b/report/paper.md
index 112bd6e..4c146c8 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -159,7 +159,7 @@ which is $O(\sqrt{D} N \log K)$ compared to $O(DNK)$ for K-means. Codebook mappi
 
 # Comparison of methods and conclusions
 
-Overall K-means achieves slightly better accuracy that the RF-codebook at the expense of a higher execution time for training **(and testing???)**.
+Overall we observe slightly higher accuracy when using K-means codebooks compared to RF codebook at the expense of a higher execution time for training and testing.
 
 As discussed in section I, due to the initialization process for optimal centroids placements, K-means can result unpreferable for large 
 descriptors' sizes (in absence of methods for dimensionality reduction),
@@ -168,7 +168,7 @@ and in many cases the increase in training time would not justify the minimum in
 For Caltech_101 RF-codebook seems to be the most suitable method to perform RF-classification.
 
 It is observable that for the particular dataset we are analysing the class *water_lilly*
-is the one that gets misclassified the most, both in k-means and RF-codebook (refer to figures \ref{fig:km_cm} and \ref{fig:p3_cm}. This means that the features obtained
+is the one that gets misclassified the most, both in K-means and RF codebooks (refer to figures \ref{fig:km_cm} and \ref{fig:p3_cm}. This means that the features obtained
 from this class do not guarantee very discriminative splits, hence the first splits in the trees
 will prioritize features taken from other classes.
 
-- 
cgit v1.2.3-70-g09d2