aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authornunzip <np.scarh@gmail.com>2019-02-15 16:59:27 +0000
committernunzip <np.scarh@gmail.com>2019-02-15 16:59:27 +0000
commite3e713a66b0a1e85714d764663823c92ffbd1f2d (patch)
tree801445bc48fde6a5ec4c43c8ad5e2b0a3edc8ef8
parent4f9214360fffadd86fb767f3b2322d657567851d (diff)
downloade4-vision-e3e713a66b0a1e85714d764663823c92ffbd1f2d.tar.gz
e4-vision-e3e713a66b0a1e85714d764663823c92ffbd1f2d.tar.bz2
e4-vision-e3e713a66b0a1e85714d764663823c92ffbd1f2d.zip
Section II grammar fix
-rw-r--r--report/paper.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/report/paper.md b/report/paper.md
index e44444b..885f27d 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -42,7 +42,7 @@ Initializing K-means is an expensive process, based on sequential attempts of ce
# RF classifier
-We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees, who's performance depends on the ensemble size, tree depth, randomness and weak learner used.
+We use a random forest classifier to label images based on the bag-of-words histograms. Random forests are an ensemble of randomly generated decision trees, whose performance depends on the ensemble size, tree depth, randomness and weak learner used.
## Hyperparameters tuning
@@ -59,7 +59,7 @@ We expect a large tree depth to lead into overfitting. However for the data anal
\end{center}
\end{figure}
-Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then choose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered as the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary of size 100 in figure \ref{fig:kmeanrandom}. The results in the figure \ref{fig:kmeanrandom} use a forest size of 100 as we infered that this is the estimatator count for which performance gains tend to plateau (when selecting $\sqrt{n}$ random features).
+Random forests will select a random number of features on which to apply a weak learner (such as axis aligned split) and then choose the best feature of the sampled ones to perform the split on, based on a given criteria (our results use the *Gini index*). The fewer features that are compared for each split the quicker the trees are built and the more random they are. Therefore the randomness parameter can be considered as the number of features used when making splits. We evaluate accuracy given different randomness when using a K-means vocabulary of size 100 in figure \ref{fig:kmeanrandom}. The results in the figure \ref{fig:kmeanrandom} also use a forest size of 100 as we infered that this is the estimatator count for which performance gains tend to plateau (when selecting $\sqrt{n}$ random features).
This parameter also affects correlation between trees. We expect trees to be more correlated when using a large number of features for splits.
\begin{figure}