aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVasil Zlatanov <v@skozl.com>2019-02-15 17:41:45 +0000
committerVasil Zlatanov <v@skozl.com>2019-02-15 17:41:45 +0000
commit4ea551472c92e0387fcd8f934fdefb81db93e8a0 (patch)
tree91eb405c06a4c0726d481f72551f5a7ab53e0936
parent404d54d233e6d1b3616a9a38a9421a0f06513be3 (diff)
parentd2f8f7376a4a785f11f062dfd81ba83b9fb83cd3 (diff)
downloade4-vision-4ea551472c92e0387fcd8f934fdefb81db93e8a0.tar.gz
e4-vision-4ea551472c92e0387fcd8f934fdefb81db93e8a0.tar.bz2
e4-vision-4ea551472c92e0387fcd8f934fdefb81db93e8a0.zip
Merge branch 'master' of skozl.com:e4-vision
-rw-r--r--report/paper.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/report/paper.md b/report/paper.md
index 6c7c0ed..89ad94e 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -42,7 +42,7 @@ Initializing K-means is an expensive process, based on sequential attempts of ce
# RF classifier
-We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees, who's performance depends on the ensemble size, tree depth, randomness and weak learner used.
+We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees, whose performance depends on the ensemble size, tree depth, randomness and weak learner used.
## Hyperparameters tuning
@@ -59,7 +59,7 @@ We expect a large tree depth to lead into overfitting. However for the data anal
\end{center}
\end{figure}
-Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then choose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered as the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary of size 100 in figure \ref{fig:kmeanrandom}. The results in the figure \ref{fig:kmeanrandom} use a forest size of 100 as we infered that this is the estimatator count for which performance gains tend to plateau (when selecting $\sqrt{n}$ random features).
+Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then choose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered as the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary of size 100 in figure \ref{fig:kmeanrandom}. The results in the figure \ref{fig:kmeanrandom} also use a forest size of 100 as we infered that this is the estimatator count for which performance gains tend to plateau (when selecting $\sqrt{n}$ random features).
This parameter also affects correlation between trees. We expect trees to be more correlated when using a large number of features for splits.
\begin{figure}
@@ -148,7 +148,7 @@ In many applications the increase in training time would not justify the small i
For the `Caltech_101` dataset, a RF codebook seems to be the most suitable method to perform RF classification.
-The `water_lilly` is the most misclassified class, both for K-means and RF codebook (refer to figures \ref{fig:km_cm} and \ref{fig:p3_cm}). This indicates that the features obtained from the class do not provide for very discriminative splits, resulting in the prioritsation of other features in the first nodes of the decision trees.
+The `water_lilly` is the most misclassified class, both for K-means and RF codebook (refer to figures \ref{fig:km_cm} and \ref{fig:p3_cm}). This indicates that the quantised descriptors obtained from the class do not provide for very discriminative splits, resulting in the prioritsation of other features in the first nodes of the decision trees.
# References