Write bag-of-words section

author: Vasil Zlatanov <v@skozl.com> 2019-02-12 17:14:56 +0000
committer: Vasil Zlatanov <v@skozl.com> 2019-02-12 17:14:56 +0000
commit: 333d158dd0bac1e1fee86c6399f763dea22a90ea (patch)
tree: 77649045bfaa8fc7d57e682fbb6eeb8d8d3be127 /report/paper.md
parent: ed2219fbb0a66c5e6d6eccad58c131e2d1ff299c (diff)
download: e4-vision-333d158dd0bac1e1fee86c6399f763dea22a90ea.tar.gz
e4-vision-333d158dd0bac1e1fee86c6399f763dea22a90ea.tar.bz2
e4-vision-333d158dd0bac1e1fee86c6399f763dea22a90ea.zip
1 files changed, 5 insertions, 5 deletions
diff --git a/report/paper.md b/report/paper.md
index 7453289..ac72f2b 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -11,11 +11,13 @@ Caltech dataset.
 
 ## Vocabulary size 
 
-The number of clusters or the number of centroids determine the vocabulary size.
+The number of clusters or the number of centroids determine the vocabulary size when creating the codebook with the K-means the method. Each descriptor is mapped to the nearest centroid, and each descriptor belonging to that cluster is mapped to the same *visual word*. This allows similar descriptors to be mapped to the same word, allowing for comparison through bag-of-words techniques.
 
-## Bag-of-words histograms of example training/testing images
+## Bag-of-words histogram quantisation of descriptor vectors
 
-Looking at picture \ref{fig:histo_te}
+An example histogram for training image shown on figure {fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}.
+
+The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1))$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE
 
 \begin{figure}[H]
 \begin{center}
@@ -35,8 +37,6 @@ Looking at picture \ref{fig:histo_te}
 \end{center}
 \end{figure}
 
-## Vector quantisation process
-
 # RF classifier 
 
 ## Hyperparameters tuning
author	Vasil Zlatanov <v@skozl.com>	2019-02-12 17:14:56 +0000
committer	Vasil Zlatanov <v@skozl.com>	2019-02-12 17:14:56 +0000
commit	333d158dd0bac1e1fee86c6399f763dea22a90ea (patch)
tree	77649045bfaa8fc7d57e682fbb6eeb8d8d3be127 /report/paper.md
parent	ed2219fbb0a66c5e6d6eccad58c131e2d1ff299c (diff)
download	e4-vision-333d158dd0bac1e1fee86c6399f763dea22a90ea.tar.gz e4-vision-333d158dd0bac1e1fee86c6399f763dea22a90ea.tar.bz2 e4-vision-333d158dd0bac1e1fee86c6399f763dea22a90ea.zip