aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--[-rwxr-xr-x]report/fig/bagging.pdfbin14941 -> 15360 bytes
-rw-r--r--[-rwxr-xr-x]report/fig/ensemble-cm.pdfbin12553 -> 12995 bytes
-rw-r--r--[-rwxr-xr-x]report/fig/random-ensemble.pdfbin14609 -> 15037 bytes
-rwxr-xr-xreport/paper.md39
4 files changed, 20 insertions, 19 deletions
diff --git a/report/fig/bagging.pdf b/report/fig/bagging.pdf
index 36c0c6a..3700851 100755..100644
--- a/report/fig/bagging.pdf
+++ b/report/fig/bagging.pdf
Binary files differ
diff --git a/report/fig/ensemble-cm.pdf b/report/fig/ensemble-cm.pdf
index f7f8659..f79b924 100755..100644
--- a/report/fig/ensemble-cm.pdf
+++ b/report/fig/ensemble-cm.pdf
Binary files differ
diff --git a/report/fig/random-ensemble.pdf b/report/fig/random-ensemble.pdf
index 9b8b1ab..6123af1 100755..100644
--- a/report/fig/random-ensemble.pdf
+++ b/report/fig/random-ensemble.pdf
Binary files differ
diff --git a/report/paper.md b/report/paper.md
index a01c9b2..bd7ef71 100755
--- a/report/paper.md
+++ b/report/paper.md
@@ -77,7 +77,7 @@ for PCA when using the low dimensional method. The main advantages of it are red
(since the eigenvectors found with the first method are extracted from a significantly
bigger matrix).
-The drawback of the low-dimensional computation technique is that we include and extra projection step, and as a result do not obtain Hermitian matrix which otherwise simplifies eigenvector computation.
+The drawback of the low-dimensional computation technique is that we include and extra left multiplication step with the training data, but it is almost always computationally much quicker than performing eigen-decomposition for large number of features.
# Question 1, Application of eigenfaces
@@ -114,11 +114,11 @@ The analysed classification methods used for face recognition are Nearest Neighb
alternative method utilising reconstruction error.
Nearest Neighbor projects the test data onto the generated subspace and finds the closest
-training sample to the projected test image, assigning the same class as that of thenearest neighbor.
+training sample to the projected test image, assigning the same class as that of the nearest neighbor.
Recognition accuracy of NN classification can be observed in figure \ref{fig:accuracy}.
-A confusion matrix showing success and failure cases for Nearest Neighbor classfication when using PCA can be observed in figure \ref{fig:cm}:
+A confusion matrix showing success and failure cases for Nearest Neighbor classification when using PCA can be observed in figure \ref{fig:cm}:
\begin{figure}
\begin{center}
@@ -182,7 +182,7 @@ A major drawback is the increase in execution time (from table \ref{tab:time}, 1
\end{center}
\end{figure}
-A confusion matrix showing success and failure cases for alternative method classfication
+A confusion matrix showing success and failure cases for alternative method classification
can be observed in figure \ref{fig:cm-alt}.
\begin{figure}
@@ -348,32 +348,33 @@ the 3 features of the subspaces obtained are graphed.
# Question 3, LDA Ensemble for Face Recognition, PCA-LDA Ensemble
-So far we have established a combined PCA-LDA model which has good recognition while maintaining relatively low execution times and looked at varying hyperparameters.
+So far we have established a combined PCA-LDA model which has good recognition while maintaining relatively low execution times and looked at varying hyperparameters. We look to further reduce testing error, through the use of ensemble learning.
-## Committee Machine Design
+## Committee Machine Design and Fusion Rules
-Since each model in the ensemble outputs its own predicted labels, we need to define a strategy for combining the predictions such that we obtain a combined response which is better than that of an individual models. For this project, we consider two committee machine designs.
+As each model in the ensemble outputs its own predicted labels, we need to define a strategy for joining the predictions such that we obtain a combined response which is better than that of the individual models. For this project, we consider two committee machine designs.
### Majority Voting
-In simple majority voting the comitee label is the most pouplar label given by the models. This can be achieved by binning all labels produced by the ensemble and classifying the test case as the class with the most bins.
+In simple majority voting the committee label is the most popular label given by the models. This can be achieved by binning all labels produced by the ensemble and classifying the test case as the class with the most bins.
This technique is not biased towards statistically better models and values all models in the ensemble equally. It is useful when models have similar accuracies and are not specialised in their classification.
-### Confidence Weighted Averaging
+### Confidence and Weighted labels
Given that the model can output confidences about the labels it predicts, we can factor the confidence of the model towards the final output of the committee machine. For instance, if a specialised model says with 95% confidence the label for the test case is "A", and two other models only classify it as "B" with 40% confidence, we would be inclined to trust the first model and classify the result as "A".
-This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K > 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively.
+Fusion rules may either take the label with the highest associated confidence, or otherwise look at the sum of all produced confidences for a given label and trust the label with the highest confidence sum.
-In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best.
+This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K > 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively. Using this technique with a large K however may be detrimental, as distance is not considered. An alternative approach of generating confidence based on the distance to the nearest neighbour may yield better result.
+In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best. Future research may attempt using weighted labeling based on neighbour distance based confidence.
## Data Randomisation (Bagging)
The first strategy which we may use when using ensemble learning is randomisation of the data, while maintaining the model static.
-Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classificatioen via a majority voting comittee machine for various ensemble sizes is evaluated in figure \ref{fig:bagging-e}. We find that for our dataset bagging tends to reach the same accuracy as an indivudual non-bagged model after an ensemble size of around 30 and achieves marginally better testing error, improving accuracy by approximately 1%.
+Bagging is performed by generating each dataset for the ensembles by randomly picking from the class training set with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classification via a majority voting committee machine for various ensemble sizes is evaluated in figure \ref{fig:bagging-e}. We find that for our dataset bagging tends to reach the same accuracy as an individual non-bagged model after an ensemble size of around 30 and achieves marginally better testing error, improving accuracy by approximately 1%.
\begin{figure}
\begin{center}
@@ -394,13 +395,13 @@ use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zer
\begin{figure}
\begin{center}
\includegraphics[width=23em]{fig/random-ensemble.pdf}
-\caption{Ensemble size effect with feature randomisation ($m_c=90$,$m_r=70$)}
+\caption{Ensemble size - feature randomisation ($m_c=90$,$m_r=70$)}
\label{fig:random-e}
\end{center}
\end{figure}
In figure \ref{fig:random-e} we can see the effect of ensemble size when using the biggest
-90 eigenvectors and 70 random eigenvectors. As can be seen from the graph, feature space randomisation is able to increase accuracy by approximately 2% for our data. However, this improvement is dependent on the number of eigenvectors used and the number of random eigenvectors. For example, using a small fully random set of eigenvectors is detrimental to the performance.
+90 eigenvectors and 70 random eigenvectors. Feature space randomisation is able to increase accuracy by approximately 2% for our data. However, this improvement is dependent on the number of eigenvectors used and the number of random eigenvectors. For example, using a small fully random set of eigenvectors is detrimental to the performance (seen on \ref{fig:vaskoplot3}).
We noticed that an ensemble size of around 27 is the point where accuracy or error plateaus. We will use this number when performing an exhaustive search on the optimal randomness parameter.
@@ -415,20 +416,20 @@ The optimal number of constant and random eigenvectors to use is therefore an in
\begin{figure}
\begin{center}
-\includegraphics[width=23em]{fig/vaskplot3.pdf}
+\includegraphics[width=19em]{fig/vaskplot3.pdf}
\caption{Recognition accuracy varying M and Randomness Parameter}
\label{fig:opti-rand}
\end{center}
\end{figure}
-The optimal randomness after doing an exhaustive search as seen on figure \label{fig:opti-rand}peaks at
-95 randomised eigenvectors out of 155 total eigenvectors, or 60 static and 95 random eigenvectors. The values of $M_{\textrm{lda}}$ in the figures is the maximum of 51.
+The optimal randomness after doing an exhaustive search as seen on figure \ref{fig:opti-rand}peaks at
+95 randomised eigenvectors out of 155 total eigenvectors, or 60 static and 95 random eigenvectors. The values of $M_{\textrm{lda}}$ in the figures is 51.
The red peaks on the 3d-plot represent the proportion of randomised eigenvectors which achieve the optimal accuracy, which have been further plotted in figure \ref{fig:opt-2d}. We found that for our data, the optimal ratio of random eigenvectors for a given $M$ is between $0.6$ and $0.9$.
\begin{figure}
\begin{center}
-\includegraphics[width=19em]{fig/nunzplot1.pdf}
+\includegraphics[width=17em]{fig/nunzplot1.pdf}
\caption{Optimal randomness ratio}
\label{fig:opt-2d}
\end{center}
@@ -439,7 +440,7 @@ The red peaks on the 3d-plot represent the proportion of randomised eigenvectors
\begin{figure}
\begin{center}
-\includegraphics[width=19em]{fig/ensemble-cm.pdf}
+\includegraphics[width=17em]{fig/ensemble-cm.pdf}
\caption{Ensemble confusion matrix (pre-comittee)}
\label{fig:ens-cm}
\end{center}