aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVasil Zlatanov <v@skozl.com>2019-02-12 20:13:27 +0000
committerVasil Zlatanov <v@skozl.com>2019-02-12 20:13:27 +0000
commit288e28070d27c496d6ac4af5676734451f8430e9 (patch)
tree370a9c789aed36693f882e2652e1ebf45112d957
parent5ebf5cafe3e6b5ab711ddb3b95299f04c0314333 (diff)
downloade4-vision-288e28070d27c496d6ac4af5676734451f8430e9.tar.gz
e4-vision-288e28070d27c496d6ac4af5676734451f8430e9.tar.bz2
e4-vision-288e28070d27c496d6ac4af5676734451f8430e9.zip
Properly reference
-rw-r--r--report/paper.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/report/paper.md b/report/paper.md
index af3f8d3..06d8357 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -16,7 +16,7 @@ The number of clusters or the number of centroids determines the vocabulary size
An example histograms for training and testing images is shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. The histograms of the same class appear to have comparable magnitudes for their respective keywords, demonstrating they had a similar number of descriptors which mapped to each of the clusters. The effect of the vocubalary size (as determined by the number of K-means centroids) on the classificaiton accuracy is shown in figure \ref{fig:km_vocsize}. A small vocabulary size tends to misrepresent the information contained in the different patches, resulting in poor classification accuracy. Conversly a large vocabulary size (many K-mean centroids), may display overfitting. In our tests, we observe a plateau after a cluster count of 60 on figure \ref{fig:km_vocsize}.
-The time complexity of quantisation with a K-means codebooks is $O(DNK)$, where N is the number of entities to be clustered (descriptors), D is the dimension (of the descriptors) and K is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids (a random selection of 100 thousand descriptors). An alternative method we tried is applying PCA to the descriptors vectors to improve time performance. However in this case the descriptors' size is relatively small, and for such reason we opted to avoid PCA for further training.
+The time complexity of quantisation with a K-means codebooks is $O(DNK)$, where N is the number of entities to be clustered (descriptors), D is the dimension (of the descriptors) and K is the cluster count [@km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids (a random selection of 100 thousand descriptors). An alternative method we tried is applying PCA to the descriptors vectors to improve time performance. However in this case the descriptors' size is relatively small, and for such reason we opted to avoid PCA for further training.
K-means is a process that converges to local optima and heavilly depends on the initialization values of the centroids.
Initializing k-means is an expensive process, based on sequential attempts of centroids placement.