From 3d94d641aa8d3d25a7d29cf8643d4d2bdba41b7a Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Tue, 20 Nov 2018 12:47:50 +0000 Subject: Use subscript in M_pca and M_lda --- report/paper.md | 41 +++++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 14 deletions(-) (limited to 'report') diff --git a/report/paper.md b/report/paper.md index 454c35e..1f36ba0 100755 --- a/report/paper.md +++ b/report/paper.md @@ -325,13 +325,13 @@ after each step. In this section we will perform PCA-LDA recognition with NN classification. -Varying the values of M_pca and M_lda we obtain the average recognition accuracies -reported in figure \ref{fig:ldapca_acc}. Peak accuracy of 93% can be observed for M_pca=115, M_lda=41; -howeverer accuracies above 90% can be observed for M_pca values between 90 and 130 and -M_lda values between 30 and 50. +Varying the values of $M_{\textrm{pca}}$ and $M_{\textrm{lda}}$ we obtain the average recognition accuracies +reported in figure \ref{fig:ldapca_acc}. Peak accuracy of 93% can be observed for $M_{\textrm{pca}}=115$, $M_{\textrm{lda}}=41$; +howeverer accuracies above 90% can be observed for $M_{\textrm{pca}}$ values between 90 and 130 and +$M_{\textrm{lda}}$ values between 30 and 50. Recognition accuracy is significantly higher than PCA, and the run time is roughly the same, -vaying between 0.11s(low M_pca) and 0.19s(high M_pca). +vaying between 0.11s(low $M_{\textrm{pca}}$) and 0.19s(high $M_{\textrm{pca}}$). \begin{figure} \begin{center} @@ -344,11 +344,11 @@ vaying between 0.11s(low M_pca) and 0.19s(high M_pca). The scatter matrices obtained, S\textsubscript{B}(scatter matrix between classes) and S\textsubscript{W}(within-class scatter matrix), respectively show ranks of at most c-1(51) and N-c(312 maximum for our standard 70-30 split). -The rank of S\textsubscript{W} will have the same value of M_pca for M_pca$\leq$N-c. +The rank of S\textsubscript{W} will have the same value of $M_{\textrm{pca}}$ for $M_{\textrm{pca}}\leq N-c$. NEED MORE SCATTER MATRIX CONTENT -Testing with M_lda=50 and M_pca=115 gives 92.9% accuracy. The results of such test can be +Testing with $M_{\textrm{lda}}=50$ and $M_{\textrm{pca}}=115$ gives 92.9% accuracy. The results of such test can be observed in the confusion matrix shown in figure \ref{fig:ldapca_cm}. \begin{figure} @@ -420,7 +420,7 @@ In our testing we have elected to use a committee machine employing majority vot The first strategy which we may use when using ensemble learning is randomisation of the data, while maintaining the model static. -Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. +Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classificatioen via a majority voting comittee machine for various ensemble sizes is evaluated in figure \label{fig:bagging-e}. We did not find bagging significantly improving classification error and it seems to platue at t the same testing error as the individual model for a large enseble size. \begin{figure} \begin{center} @@ -433,23 +433,31 @@ Bagging is performed by generating each dataset for the ensembles by randomly pi ## Feature Space Randomisation -Feature space randomisations involves randomising the features which are analysed by the model. In the case of PCA-LDA this can be achieved by randomising the eigenvectors used when performing the PCA step. For instance, instead of choosing the most variant 120 eigenfaces, we may chose to use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zero eigenvectors randomly. +Feature space randomisations involves randomising the features which are analysed by the model. +In the case of PCA-LDA this can be achieved by randomising the eigenvectors used when performing +the PCA step. For instance, instead of choosing the most variant 120 eigenfaces, we may chose to +use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zero eigenvectors randomly. \begin{figure} \begin{center} \includegraphics[width=19em]{fig/random-ensemble.pdf} -\caption{Ensemble size effect on accraucy with 160 eigenvalues ($m_c=90$,$m_r=70$)} +\caption{Ensemble size effect with feature randomisation ($m_c=90$,$m_r=70$)} \label{fig:random-e} \end{center} \end{figure} -In figure \ref{fig:random-e} we can see the effect of ensemble size when using the bigget 90 eigenvalues and 70 random eigenvalues. +In figure \ref{fig:random-e} we can see the effect of ensemble size when using the bigget +90 eigenvalues and 70 random eigenvalues. -We noticed that an ensemble size of around 27 is the point wher accuracy or error plateues. We will use this number when performing an exhaustive search on the optimal randomness parameter. +We noticed that an ensemble size of around 27 is the point wher accuracy or error plateues. +We will use this number when performing an exhaustive search on the optimal randomness parameter. ### Optimal randomness hyper-parameter -The randomness hyper-parameter regarding feature space randomsiation can be defined as the number of features we chose to randomise. For instance the figure \ref{fig:random-e} we chose 70 out of 160 eigenvalues to be random. We could chose to use more than 70 random eigenvalues, thereby increasing the randomness. Conversly we could decrease the randomness parameter, randomising less of the eigenvectors. +The randomness hyper-parameter regarding feature space randomsiation can be defined as the number of +features we chose to randomise. For instance the figure \ref{fig:random-e} we chose 70 out of 160 +eigenvalues to be random. We could chose to use more than 70 random eigenvalues, thereby increasing +the randomness. Conversly we could decrease the randomness parameter, randomising less of the eigenvectors. The optimal number of constant and random eigenvectors to use is therefore an interesting question. @@ -461,12 +469,17 @@ The optimal number of constant and random eigenvectors to use is therefore an in \end{center} \end{figure} -The optimal randomness after doing an exhaustive search as seen on figure \label{fig:opti-rand}peaks at 95 randomised eigenvalues out of 155 total eigenvalues, or 60 static and 95 random eigenvalues. The values of $M_{\textrm{lda}}$ in the figures is the maximum of 51. +The optimal randomness after doing an exhaustive search as seen on figure \label{fig:opti-rand}peaks at +95 randomised eigenvalues out of 155 total eigenvalues, or 60 static and 95 random eigenvalues. The values of $M_{\textrm{lda}}$ in the figures is the maximum of 51. + +The red peaks on the 3d-plot represent the proportion of randomised eigenvalues which achieve the optimal accuracy, which have been further plotted in figure \label{opt-2d} ## Comparison Combining bagging and feature space randomization we are able to achieve higher test accuracy than the individual models. +### Various Splits/Seeds + ### Ensemble Confusion Matrix \begin{figure} -- cgit v1.2.3-54-g00ecf