diff options
-rwxr-xr-x | report/paper.md | 18 |
1 files changed, 8 insertions, 10 deletions
diff --git a/report/paper.md b/report/paper.md index 32db134..0f385c1 100755 --- a/report/paper.md +++ b/report/paper.md @@ -28,7 +28,7 @@ as a sudden drop for eigenvalues after the 363rd. The mean image is calculated by averaging the features of the training data. Changing the randomisation seed gives -very similar values, since the vast majority of the training +similar values, since the majority of the training faces used for averaging are the same. Two mean faces obtained with different seeds for split can be seen in figure \ref{fig:mean_face}. @@ -69,7 +69,7 @@ and eigenvectors of the matrices A\textsuperscript{T}A (NxN) and AA\textsuperscr are shown in Appendix, table \ref{tab:eigen}. It can be proven that the eigenvalues obtained are mathematically the same [@lecture-notes], -and the there is a relation between the eigenvectors obtained: $\boldsymbol{u\textsubscript{i}} = A\boldsymbol{v\textsubscript{i}}$. (*Proof in appendix A*). +and the there is a relation between the eigenvectors obtained: $\boldsymbol{u\textsubscript{i}} = A\boldsymbol{v\textsubscript{i}}$. (*Proof: Appendix A*). Experimentally there is no consequential loss of data calculating the eigenvectors for PCA when using the low dimensional method. The main advantages of it are reduced computation time, @@ -282,8 +282,7 @@ In this section we will perform PCA-LDA recognition with NN classification. Varying the values of $M_{\textrm{pca}}$ and $M_{\textrm{lda}}$ we obtain the average recognition accuracies reported in figure \ref{fig:ldapca_acc}. Peak accuracy of 93% can be observed for $M_{\textrm{pca}}=115$, $M_{\textrm{lda}}=41$; -howeverer accuracies above 90% can be observed for $M_{\textrm{pca}}$ values between 90 and 130 and -$M_{\textrm{lda}}$ values between 30 and 50. +howeverer accuracies above 90% can be observed for $130 > M_{\textrm{pca}} 90$ and $ 50 > M_{\textrm{lda}} > 30$ values between 30 and 50. Recognition accuracy is significantly higher than PCA, and the run time is roughly the same, vaying between 0.11s(low $M_{\textrm{pca}}$) and 0.19s(high $M_{\textrm{pca}}$). Execution times @@ -368,7 +367,7 @@ Fusion rules may either take the label with the highest associated confidence, o This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K > 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively. Using this technique with a large K however may be detrimental, as distance is not considered. An alternative approach of generating confidence based on the distance to the nearest neighbour may yield better result. -In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best. Future research may attempt using weighted labeling based on neighbour distance based confidence. +In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best. Future work may investigate weighted labeling using neighbour distance based confidence. ## Data Randomisation (Bagging) @@ -401,9 +400,9 @@ use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zer \end{figure} In figure \ref{fig:random-e} we can see the effect of ensemble size when using the biggest -90 eigenvectors and 70 random eigenvectors. Feature space randomisation is able to increase accuracy by approximately 2% for our data. However, this improvement is dependent on the number of eigenvectors used and the number of random eigenvectors. For example, using a small fully random set of eigenvectors is detrimental to the performance (seen on \ref{fig:vaskoplot3}). +90 constant and 70 random eigenvectors. Feature space randomisation is able to increase accuracy by approximately 2% for our data. This improvement is dependent on the number of eigenvectors used and the number of them which is random. I.e. using a small fully random set of eigenvectors is detrimental to the performance. -We noticed that an ensemble size of around 27 is the point where accuracy or error plateaus. We will use this number when performing an exhaustive search on the optimal randomness parameter. +An ensemble size of around 27 is where accuracy or error plateaus. We will use this number when performing an exhaustive search on the optimal randomness parameter. ### Optimal randomness hyper-parameter @@ -440,7 +439,7 @@ The red peaks on the 3d-plot represent the proportion of randomised eigenvectors \begin{figure} \begin{center} -\includegraphics[width=17em]{fig/ensemble-cm.pdf} +\includegraphics[width=15em]{fig/ensemble-cm.pdf} \caption{Ensemble confusion matrix (pre-comittee)} \label{fig:ens-cm} \end{center} @@ -450,7 +449,7 @@ We can compute an ensemble confusion matrix before the committee machines as sho ## Comparison -Combining bagging and feature space randomization we are able to consistently achieve higher test accuracy than the individual models. In table \ref{tab:compare} $70/30$ splits. +Combining bagging and feature space randomization we are able to consistently achieve higher test accuracy than the individual models. \begin{table}[ht] \begin{tabular}{lrr} \hline @@ -459,7 +458,6 @@ Seed & Individual$(M=120)$ & Bag + Feature Ens.$(M=60+95)$\\ \hline 1 & 0.929 & 0.942 \\ 5 & 0.897 & 0.910 \\ \hline \end{tabular} -\caption{Comparison of individual and ensemble} \label{tab:compare} \end{table} |