diff options
Diffstat (limited to 'report')
-rw-r--r-- | report/bibliography.bib | 17 | ||||
-rw-r--r-- | report/paper.md | 10 |
2 files changed, 22 insertions, 5 deletions
diff --git a/report/bibliography.bib b/report/bibliography.bib index fe80de4..5d4e51e 100644 --- a/report/bibliography.bib +++ b/report/bibliography.bib @@ -1,3 +1,20 @@ +@inproceedings{km-complexity, + author = {Inaba, Mary and Katoh, Naoki and Imai, Hiroshi}, + title = {Applications of Weighted Voronoi Diagrams and Randomization to Variance-based K-clustering: (Extended Abstract)}, + booktitle = {Proceedings of the Tenth Annual Symposium on Computational Geometry}, + series = {SCG '94}, + year = {1994}, + isbn = {0-89791-648-4}, + location = {Stony Brook, New York, USA}, + pages = {332--339}, + numpages = {8}, + url = {http://doi.acm.org/10.1145/177424.178042}, + doi = {10.1145/177424.178042}, + acmid = {178042}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + @article{rerank-paper, author = {Zhun Zhong and Liang Zheng and diff --git a/report/paper.md b/report/paper.md index 7453289..ac72f2b 100644 --- a/report/paper.md +++ b/report/paper.md @@ -11,11 +11,13 @@ Caltech dataset. ## Vocabulary size -The number of clusters or the number of centroids determine the vocabulary size. +The number of clusters or the number of centroids determine the vocabulary size when creating the codebook with the K-means the method. Each descriptor is mapped to the nearest centroid, and each descriptor belonging to that cluster is mapped to the same *visual word*. This allows similar descriptors to be mapped to the same word, allowing for comparison through bag-of-words techniques. -## Bag-of-words histograms of example training/testing images +## Bag-of-words histogram quantisation of descriptor vectors -Looking at picture \ref{fig:histo_te} +An example histogram for training image shown on figure {fig:histo_tr}, computed with a vocubulary size of 100. A corresponding testing image of the same class is shown in figure \ref{fig:histo_te}. The histograms appear to have similar counts for the same words, demonstrating they had a descriptors which matched the *keywowrds* in similar proportions. We later look at the effect of the vocubalary size (as determined by the number of K-mean centroids) on the classificaiton accuracy in figure \ref{fig:km_vocsize}. + +The time complexity of quantisation with a K-means codebooks is $O(n^{dk+1))$ , where n is the number of entities to be clustered, d is the dimension and k is the cluster count @cite[km-complexity]. As the computation time is high, the tests we use a subsample of descriptors to compute the centroids. An alternative method is NUNZIO PUCCI WRITE HERE \begin{figure}[H] \begin{center} @@ -35,8 +37,6 @@ Looking at picture \ref{fig:histo_te} \end{center} \end{figure} -## Vector quantisation process - # RF classifier ## Hyperparameters tuning |