From 66010e7c039bd5cc7879b1beb80ba860188fcbe9 Mon Sep 17 00:00:00 2001
From: Vasil Zlatanov <v@skozl.com>
Date: Tue, 12 Feb 2019 17:19:18 +0000
Subject: Typo fixes

---
 report/paper.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/report/paper.md b/report/paper.md
index ac72f2b..1b03992 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -15,9 +15,9 @@ The number of clusters or the number of centroids determine the vocabulary size
 
 ## Bag-of-words histogram quantisation of descriptor vectors
 
-An example histogram for training image shown on figure {fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.
+An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.
 
-The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1))$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE
+The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE
 
 \begin{figure}[H]
 \begin{center}
-- 
cgit v1.2.3-70-g09d2


From f8f7f8f692bc960ef5c5be74936a6d1de5a5caac Mon Sep 17 00:00:00 2001
From: Vasil Zlatanov <v@skozl.com>
Date: Tue, 12 Feb 2019 17:22:22 +0000
Subject: Cite correctly

---
 report/paper.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report/paper.md b/report/paper.md
index 1b03992..3e5ec48 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -17,7 +17,7 @@ The number of clusters or the number of centroids determine the vocabulary size
 
 An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.
 
-The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE
+The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count [@km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE
 
 \begin{figure}[H]
 \begin{center}
-- 
cgit v1.2.3-70-g09d2


From 801b30cc67dee6071101320bc9ac1f6edde655f9 Mon Sep 17 00:00:00 2001
From: Vasil Zlatanov <v@skozl.com>
Date: Tue, 12 Feb 2019 17:22:55 +0000
Subject: Remove word to improve spacing

---
 report/paper.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report/paper.md b/report/paper.md
index 3e5ec48..bae8979 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -15,7 +15,7 @@ The number of clusters or the number of centroids determine the vocabulary size
 
 ## Bag-of-words histogram quantisation of descriptor vectors
 
-An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.
+An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.
 
 The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count [@km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE
 
-- 
cgit v1.2.3-70-g09d2


From 63e908514ce57e1bc03301c950ffb360976dace9 Mon Sep 17 00:00:00 2001
From: Vasil Zlatanov <v@skozl.com>
Date: Tue, 12 Feb 2019 17:34:41 +0000
Subject: Write section for RF codebook

---
 report/paper.md | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/report/paper.md b/report/paper.md
index bae8979..b6d56dd 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -110,12 +110,7 @@ more. This is due to the complexity added by the two-pixels test, since it adds
 
 # RF codebook
 
-In Q1, replace the K-means with the random forest codebook, i.e. applying RF to 128 dimensional
-descriptor vectors with their image category labels, and using the RF leaves as the visual
-vocabulary. With the bag-of-words representations of images obtained by the RF codebook, train
-and test Random Forest classifier similar to Q2. Try different parameters of the RF codebook and
-RF classifier, and show/discuss the results in comparison with the results of Q2, including the
-vector quantisation complexity. 
+An alternative to codebook creation via *K-means* involves using an ensemble of totally random trees. We code each decriptor according to which leaf of each tree in the ensemble it is sorted. This effectively performs and unsupervised transformation of our dataset to a high-dimensional sparse representation. The dimension of the vocubulary size is determined by the number of leaves in each random tree and the ensemble size.
 
 \begin{figure}[H]
 \begin{center}
-- 
cgit v1.2.3-70-g09d2