From c28d4ccec5f1677c98945f6370d4f5f181e8dbff Mon Sep 17 00:00:00 2001 From: Vasil Zlatanov Date: Tue, 20 Nov 2018 14:05:37 +0000 Subject: Improvements to sec3 and elsewhere --- report/fig/bagging.pdf | Bin 11945 -> 14941 bytes report/fig/random-ensemble.pdf | Bin 12634 -> 14609 bytes report/paper.md | 29 ++++++++++++----------------- 3 files changed, 12 insertions(+), 17 deletions(-) (limited to 'report') diff --git a/report/fig/bagging.pdf b/report/fig/bagging.pdf index 602e44f..36c0c6a 100644 Binary files a/report/fig/bagging.pdf and b/report/fig/bagging.pdf differ diff --git a/report/fig/random-ensemble.pdf b/report/fig/random-ensemble.pdf index bc969de..9b8b1ab 100644 Binary files a/report/fig/random-ensemble.pdf and b/report/fig/random-ensemble.pdf differ diff --git a/report/paper.md b/report/paper.md index 1f36ba0..6442b74 100755 --- a/report/paper.md +++ b/report/paper.md @@ -271,9 +271,7 @@ With S\textsubscript{B} being the scatter matrix between classes, S\textsubscrip being the within-class scatter matrix and W being the set of projection vectors. $\mu$ represents the mean of each class. -NEED TO REFERENC THIS WHOLE SECTION - -It can be proven that when we have a singular S\textsubscript{W} we obtain: $W\textsubscript{opt} = arg\underset{W}max\frac{|W\textsuperscript{T}S\textsubscript{B}W|}{|W\textsuperscript{T}S\textsubscript{W}W|} = S\textsubscript{W}\textsuperscript{-1}(\mu\textsubscript{1} - \mu\textsubscript{2})$ +It can be proven that when we have a singular S\textsubscript{W} we obtain [@lecture-notes]: $W\textsubscript{opt} = arg\underset{W}max\frac{|W\textsuperscript{T}S\textsubscript{B}W|}{|W\textsuperscript{T}S\textsubscript{W}W|} = S\textsubscript{W}\textsuperscript{-1}(\mu\textsubscript{1} - \mu\textsubscript{2})$ However S\textsubscript{W} is often singular since the rank of S\textsubscript{W} is at most N-c and usually N is smaller than D. In such case it is possible to use @@ -346,8 +344,6 @@ S\textsubscript{W}(within-class scatter matrix), respectively show ranks of at m N-c(312 maximum for our standard 70-30 split). The rank of S\textsubscript{W} will have the same value of $M_{\textrm{pca}}$ for $M_{\textrm{pca}}\leq N-c$. -NEED MORE SCATTER MATRIX CONTENT - Testing with $M_{\textrm{lda}}=50$ and $M_{\textrm{pca}}=115$ gives 92.9% accuracy. The results of such test can be observed in the confusion matrix shown in figure \ref{fig:ldapca_cm}. @@ -420,11 +416,11 @@ In our testing we have elected to use a committee machine employing majority vot The first strategy which we may use when using ensemble learning is randomisation of the data, while maintaining the model static. -Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classificatioen via a majority voting comittee machine for various ensemble sizes is evaluated in figure \label{fig:bagging-e}. We did not find bagging significantly improving classification error and it seems to platue at t the same testing error as the individual model for a large enseble size. +Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classificatioen via a majority voting comittee machine for various ensemble sizes is evaluated in figure \label{fig:bagging-e}. We find that for our dataset bagging tends to reach the same accuracy as an indivudual non-bagged model after an ensemble size of around 30 and achieves marginally better testing error, improving accuracy by approximately 1%. \begin{figure} \begin{center} -\includegraphics[width=19em]{fig/bagging.pdf} +\includegraphics[width=22em]{fig/bagging.pdf} \caption{Ensemble size effect on accuracy with bagging} \label{fig:bagging-e} \end{center} @@ -440,24 +436,23 @@ use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zer \begin{figure} \begin{center} -\includegraphics[width=19em]{fig/random-ensemble.pdf} +\includegraphics[width=23em]{fig/random-ensemble.pdf} \caption{Ensemble size effect with feature randomisation ($m_c=90$,$m_r=70$)} \label{fig:random-e} \end{center} \end{figure} -In figure \ref{fig:random-e} we can see the effect of ensemble size when using the bigget -90 eigenvalues and 70 random eigenvalues. +In figure \ref{fig:random-e} we can see the effect of ensemble size when using the biggest +90 eigenvectors and 70 random eigenvectors. As can be seen from the graph, feature space randomisation is able to increase accuracy by approximately 2% for our data. However Thes improvement is dependent on the number of eigenvectors used and the number of random eigenvectors. For example, using a small fully random set of eigenvectors is detrimental to the performance. -We noticed that an ensemble size of around 27 is the point wher accuracy or error plateues. -We will use this number when performing an exhaustive search on the optimal randomness parameter. +We noticed that an ensemble size of around 27 is the point where accuracy or error plateaus. We will use this number when performing an exhaustive search on the optimal randomness parameter. ### Optimal randomness hyper-parameter -The randomness hyper-parameter regarding feature space randomsiation can be defined as the number of +The randomness hyper-parameter regarding feature space randomisation can be defined as the number of features we chose to randomise. For instance the figure \ref{fig:random-e} we chose 70 out of 160 -eigenvalues to be random. We could chose to use more than 70 random eigenvalues, thereby increasing -the randomness. Conversly we could decrease the randomness parameter, randomising less of the eigenvectors. +eigenvectors to be random. We could chose to use more than 70 random eigenvectors, thereby increasing +the randomness. Conversely we could decrease the randomness parameter, randomising less of the eigenvectors. The optimal number of constant and random eigenvectors to use is therefore an interesting question. @@ -470,9 +465,9 @@ The optimal number of constant and random eigenvectors to use is therefore an in \end{figure} The optimal randomness after doing an exhaustive search as seen on figure \label{fig:opti-rand}peaks at -95 randomised eigenvalues out of 155 total eigenvalues, or 60 static and 95 random eigenvalues. The values of $M_{\textrm{lda}}$ in the figures is the maximum of 51. +95 randomised eigenvectors out of 155 total eigenvectors, or 60 static and 95 random eigenvectors. The values of $M_{\textrm{lda}}$ in the figures is the maximum of 51. -The red peaks on the 3d-plot represent the proportion of randomised eigenvalues which achieve the optimal accuracy, which have been further plotted in figure \label{opt-2d} +The red peaks on the 3d-plot represent the proportion of randomised eigenvectors which achieve the optimal accuracy, which have been further plotted in figure \label{opt-2d} ## Comparison -- cgit v1.2.3-54-g00ecf