aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVasil Zlatanov <v@skozl.com>2019-02-12 17:39:46 +0000
committerVasil Zlatanov <v@skozl.com>2019-02-12 17:39:46 +0000
commit1cfebbbda1fd9fc6d3a4eb9ce9d1ba79971bcc92 (patch)
treefa11e6212e8575b6f03947237444d4a1792ecba6
parent70a47f95ac979869fbed7303e3c370f8b6388dd8 (diff)
downloade4-vision-1cfebbbda1fd9fc6d3a4eb9ce9d1ba79971bcc92.tar.gz
e4-vision-1cfebbbda1fd9fc6d3a4eb9ce9d1ba79971bcc92.tar.bz2
e4-vision-1cfebbbda1fd9fc6d3a4eb9ce9d1ba79971bcc92.zip
Consitent K-means 2
-rw-r--r--report/paper.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/report/paper.md b/report/paper.md
index 5c538e9..7483c2e 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -15,7 +15,7 @@ The number of clusters or the number of centroids determines the vocabulary size
## Bag-of-words histogram quantisation of descriptor vectors
-An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}. A small vocabulary size turns out to misrepresent the information contained in the different patches, resulting in poor classification accuracy. When the vocabulary size gets too big (too many k-mean centroids), the result is instead overfitting. Figure \ref{fig:km_vocsize} shows a plateau after 60 cluster centers.
+An example histogram for training image shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-means centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}. A small vocabulary size turns out to misrepresent the information contained in the different patches, resulting in poor classification accuracy. When the vocabulary size gets too big (too many K-means centroids), the result is instead overfitting. Figure \ref{fig:km_vocsize} shows a plateau after 60 cluster centers.
The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1})$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method we tried is applying PCA to the descriptors vecotrs to improve time performance. However in this case the descriptors' size is relatively small, and for such reason we opted to avoid PCA for further training.