aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authornunzip <np.scarh@gmail.com>2019-02-15 16:39:44 +0000
committernunzip <np.scarh@gmail.com>2019-02-15 16:39:44 +0000
commit9c3582f77793b6c354ce64e584905e239ad3daf8 (patch)
treed24962da0a09c24ee08eefa80f6c5d5d5be0467c
parentda7d061e29bd62de42f9b7e8e7cc8e3a24e9240f (diff)
parentb052905b323972a69fbf04fd621a6d88c44c40a1 (diff)
downloade4-vision-9c3582f77793b6c354ce64e584905e239ad3daf8.tar.gz
e4-vision-9c3582f77793b6c354ce64e584905e239ad3daf8.tar.bz2
e4-vision-9c3582f77793b6c354ce64e584905e239ad3daf8.zip
Merge branch 'master' of skozl.com:e4-vision
-rw-r--r--report/paper.md16
1 files changed, 10 insertions, 6 deletions
diff --git a/report/paper.md b/report/paper.md
index e5f3ee7..43411bc 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -4,17 +4,21 @@ A common technique for codebook generation involves utilising K-means clustering
image descriptors. In this way descriptors may be mapped to *visual* words which lend themselves to
binning and therefore the creation of bag-of-words histograms for the use of classification.
-In this courseworok 100-thousand random SIFT descriptors (size=128) of the Caltech_101 dataset are used to build the K-means visual vocabulary.
+In this courseworok 100-thousand random SIFT descriptors (with 128 dimenions) of the Caltech_101 dataset are used to build the K-means visual vocabulary.
Both training and testing use 15 randomly selected images from the 10 available classess.
## Vocabulary size
+<<<<<<< HEAD
The number of clusters or the number of centroids determines the vocabulary size when creating the codebook with the K-means method. Each descriptor is mapped to the nearest centroid, and each descriptor belonging to that cluster is mapped to the same *visual word*. This allows similar descriptors to be mapped to the same word, allowing for comparison through bag-of-words techniques.
+=======
+The number of clusters or the number of centroids determine the vocabulary size when creating a codebook with the K-means the method. Each descriptor is mapped to the nearest centroid, and each descriptor belonging to that cluster is mapped to the same *visual word*. This allows similar descriptors to be mapped to the same word, allowing for comparison through bag-of-words techniques.
+>>>>>>> b052905b323972a69fbf04fd621a6d88c44c40a1
-## Bag-of-words histogram quantisation of descriptor vectors
+## Bag-of-words histogram of descriptor vectors
-An example of histograms for training and testing images is shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. The histograms of the same class appear to have comparable magnitudes for their respective keywords, demonstrating they had a similar number of descriptors which mapped to each of the clusters. The effect of the vocubalary size (as determined by the number of K-means centroids) on the classificaiton accuracy is shown in figure \ref{fig:km_vocsize}. A small vocabulary size tends to misrepresent the information contained in the different patches, resulting in poor classification accuracy. Conversly a large vocabulary size (many K-mean centroids), may display overfitting. In our tests, we begin to observe a plateau after a cluster count of 60 on figure \ref{fig:km_vocsize}. This proccess of partitioning the input space into K distinct clusters is a form of **vector quantisation**.
+An example of histograms for training and testing images is shown on figure \ref{fig:histo_tr}, computed with a vocubulary size of 100. The histograms of the same class appear to have comparable magnitudes for their respective keywords, demonstrating they have a similar number of descriptors which map to each of the clusters. The effect of the vocubalary size (as determined by the number of K-means centroids) on the classificaiton accuracy is shown in figure \ref{fig:km_vocsize}. We find that a small vocabulary size tends to misrepresent the information contained in the different patches, resulting in poor classification accuracy. Conversly a large vocabulary size (many K-mean centroids), may display overfitting. In our tests, we begin to observe a plateau after a cluster count of 60 on figure \ref{fig:km_vocsize}. The proccess of partitioning the input space into K distinct clusters is a form of **vector quantisation**.
\begin{figure}
\begin{center}
@@ -29,7 +33,7 @@ An example of histograms for training and testing images is shown on figure \ref
The time complexity of quantisation with a K-means codebooks is $O(DNK)$, where N is the number of entities to be clustered (descriptors), D is the dimension (of the descriptors) and K is the cluster count [@km-complexity]. As the computation time is high, the tests use a subsample of descriptors to compute the centroids (a random selection of 100 thousand descriptors). An alternative method we tried is applying PCA to the descriptors vectors to improve time performance. However, the descriptor dimension of 128 is relatiely small and as such we found PCA to be unnecessary.
K-means is a process that converges to local optima and heavily depends on the initialization values of the centroids.
-Initializing K-means is an expensive process, based on sequential attempts of centroids placement. Running for multiple instances significantly affects the computation process, leading to a linear increase in execution time. We did not observe increase in accuracy with more than one K-means clusters initializations, and therefore present results for accuracy and execution time with a single K-Mean initialization.
+Initializing K-means is an expensive process, based on sequential attempts of centroids placement. Running for multiple instances significantly affects the computation process, leading to a linear increase in execution time. We did not observe increase in accuracy with more than one K-means clusters initializations, and therefore present results for accuracy and execution time with a single K-Means initialization.
\begin{figure}
\begin{center}
@@ -42,7 +46,7 @@ Initializing K-means is an expensive process, based on sequential attempts of ce
# RF classifier
-We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees. Random forest classifier performance depends on the ensemble size, tree depth, randomness and weak learner used.
+We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees, who's performance depends on the ensemble size, tree depth, randomness and weak learner used.
## Hyperparameters tuning
@@ -60,7 +64,7 @@ We expect a large tree depth to lead into overfitting. However for the data anal
\end{figure}
Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then choose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered as the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary of size 100 in figure \ref{fig:kmeanrandom}. The results in the figure \ref{fig:kmeanrandom} use a forest size of 100 as we infered that this is the estimatator count for which performance gains tend to plateau (when selecting $\sqrt{n}$ random features).
-This parameter also affects correlation between trees. We expect in fact trees to be more correlated when using a large number of features for splits.
+This parameter also affects correlation between trees. We expect trees to be more correlated when using a large number of features for splits.
\begin{figure}
\begin{center}