aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authornunzip <np.scarh@gmail.com>2019-02-14 18:44:50 +0000
committernunzip <np.scarh@gmail.com>2019-02-14 18:44:50 +0000
commit8fd410bcadd8b3fa3cb0896784b1b3beac542d01 (patch)
tree348487f683c959ae20f6a0fa1ad8d11ddc0d25b6
parente7bb63c5f3195ee0505ff834df5110f1e2a51c70 (diff)
downloade4-vision-8fd410bcadd8b3fa3cb0896784b1b3beac542d01.tar.gz
e4-vision-8fd410bcadd8b3fa3cb0896784b1b3beac542d01.tar.bz2
e4-vision-8fd410bcadd8b3fa3cb0896784b1b3beac542d01.zip
Complete grammar fix
-rw-r--r--report/paper.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/report/paper.md b/report/paper.md
index 40a2137..81800cb 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -29,7 +29,7 @@ An example of histograms for training and testing images is shown on figure \ref
The time complexity of quantisation with a K-means codebooks is $O(DNK)$, where N is the number of entities to be clustered (descriptors), D is the dimension (of the descriptors) and K is the cluster count [@km-complexity]. As the computation time is high, the tests use a subsample of descriptors to compute the centroids (a random selection of 100 thousand descriptors). An alternative method we tried is applying PCA to the descriptors vectors to improve time performance. However, the descriptor dimension of 128 is relatiely small and as such we found PCA to be unnecessary.
K-means is a process that converges to local optima and heavily depends on the initialization values of the centroids.
-Initializing K-means is an expensive process, based on sequential attempts of centroids placement. Running for multiple instances significantly affects the computation process, leading to a linear increase in execution time. We did not observe increase in accuracy with K-means estimator size larger than one, and therefore present results for accuracy and execution time with a single K-Mean estimator.
+Initializing K-means is an expensive process, based on sequential attempts of centroids placement. Running for multiple instances significantly affects the computation process, leading to a linear increase in execution time. We did not observe increase in accuracy with more than one K-means clusters initializations, and therefore present results for accuracy and execution time with a single K-Mean initialization.
\begin{figure}
\begin{center}
@@ -101,7 +101,7 @@ Figure \ref{fig:km_cm} shows a confusion matrix for RF Classification on K-means
# RF codebook
-An alternative to codebook creation via K-means involves using an ensemble of totally random trees. We code each decriptor according to which leaf of each tree in the ensemble it is sorted. This effectively performs an unsupervised quantization of our descriptors. The vocabulary size is determined by the number of leaves in each random tree multiplied by the ensemble size. From comparing execution times of K-means in figure \ref{fig:km_vocsize} and the RF codebook in \ref{fig:p3_voc} we observe considerable speed gains from utilising the RF codebook. This may be attributed to the reduce complexity of RF Codebook creation,
+An alternative to codebook creation via K-means involves using an ensemble of totally random trees. We code each decriptor according to which leaf of each tree in the ensemble it is sorted. This effectively performs an unsupervised quantization of our descriptors. The vocabulary size is determined by the number of leaves in each random tree multiplied by the ensemble size. From comparing execution times of K-means in figure \ref{fig:km_vocsize} and the RF codebook in \ref{fig:p3_voc} we observe considerable speed gains from utilising the RF codebook. This may be attributed to the reduced complexity of RF Codebook creation,
which is $O(\sqrt{D} N \log K)$ compared to $O(DNK)$ for K-means. Codebook mapping given a created vocabulary is also quicker than K-means, $O(\log K)$ (assuming a balanced tree) vs $O(KD)$.
The effect of vocabulary size on classification accuracy can be observed both in figure \ref{fig:p3_voc}, in which we independently vary number of leaves and ensemble size, and figure \ref{fig:p3_colormap}, in which both parameters are varied simultaneously. It is possible to notice that these two parameters make classification accuracy plateau for *leaves*$>80$ and *estimators*$>100$. The peaks of 82% accuracy visible on the heatmap in figure \ref{fig:p3_colormap} are highly dependent on the seed and indicate the range of *good* hyperparametres.