aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rwxr-xr-xreport/paper.md27
1 files changed, 14 insertions, 13 deletions
diff --git a/report/paper.md b/report/paper.md
index 01984cc..7e0704b 100755
--- a/report/paper.md
+++ b/report/paper.md
@@ -77,7 +77,7 @@ for PCA when using the low dimensional method. The main advantages of it are red
(since the eigenvectors found with the first method are extracted from a significantly
bigger matrix).
-The drawback of the low-dimensional computation technique is that we include and extra projection step, and as a result do not obtain Hermitian matrix which otherwise simplifies eigenvector computation.
+The drawback of the low-dimensional computation technique is that we include and extra left multiplication step with the training data, but it is almost always computationally much quicker than performing eigen-decomposition for large number of features.
# Question 1, Application of eigenfaces
@@ -114,11 +114,11 @@ The analysed classification methods used for face recognition are Nearest Neighb
alternative method utilising reconstruction error.
Nearest Neighbor projects the test data onto the generated subspace and finds the closest
-training sample to the projected test image, assigning the same class as that of thenearest neighbor.
+training sample to the projected test image, assigning the same class as that of the nearest neighbor.
Recognition accuracy of NN classification can be observed in figure \ref{fig:accuracy}.
-A confusion matrix showing success and failure cases for Nearest Neighbor classfication when using PCA can be observed in figure \ref{fig:cm}:
+A confusion matrix showing success and failure cases for Nearest Neighbor classification when using PCA can be observed in figure \ref{fig:cm}:
\begin{figure}
\begin{center}
@@ -181,7 +181,7 @@ will be used for each generated class-subspace.
\end{center}
\end{figure}
-A confusion matrix showing success and failure cases for alternative method classfication
+A confusion matrix showing success and failure cases for alternative method classification
can be observed in figure \ref{fig:cm-alt}.
\begin{figure}
@@ -346,32 +346,33 @@ the 3 features of the subspaces obtained are graphed.
# Question 3, LDA Ensemble for Face Recognition, PCA-LDA Ensemble
-So far we have established a combined PCA-LDA model which has good recognition while maintaining relatively low execution times and looked at varying hyperparameters.
+So far we have established a combined PCA-LDA model which has good recognition while maintaining relatively low execution times and looked at varying hyperparameters. We look to further reduce testing error, through the use of ensemble learning.
-## Committee Machine Design
+## Committee Machine Design and Fusion Rules
-Since each model in the ensemble outputs its own predicted labels, we need to define a strategy for combining the predictions such that we obtain a combined response which is better than that of an individual models. For this project, we consider two committee machine designs.
+As each model in the ensemble outputs its own predicted labels, we need to define a strategy for joining the predictions such that we obtain a combined response which is better than that of an individual models. For this project, we consider two committee machine designs.
### Majority Voting
-In simple majority voting the comitee label is the most pouplar label given by the models. This can be achieved by binning all labels produced by the ensemble and classifying the test case as the class with the most bins.
+In simple majority voting the committee label is the most popular label given by the models. This can be achieved by binning all labels produced by the ensemble and classifying the test case as the class with the most bins.
This technique is not biased towards statistically better models and values all models in the ensemble equally. It is useful when models have similar accuracies and are not specialised in their classification.
-### Confidence Weighted Averaging
+### Confidence and Weighted labels
Given that the model can output confidences about the labels it predicts, we can factor the confidence of the model towards the final output of the committee machine. For instance, if a specialised model says with 95% confidence the label for the test case is "A", and two other models only classify it as "B" with 40% confidence, we would be inclined to trust the first model and classify the result as "A".
-This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K > 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively.
+Fusion rules may either take the label with the highest associated confidence, or otherwise look at the sum of all produced confidences for a given label and trust the label with the highest confidence sum.
-In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best.
+This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K > 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively. Using this technique with a large K however may be detrimental, as distance is not considered. An alternative approach of generating confidence based on the distance to the nearest neighbour may yield better result.
+In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best. Future research may attempt using weighted labeling based on neighbour distance based confidence.
## Data Randomisation (Bagging)
The first strategy which we may use when using ensemble learning is randomisation of the data, while maintaining the model static.
-Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classificatioen via a majority voting comittee machine for various ensemble sizes is evaluated in figure \ref{fig:bagging-e}. We find that for our dataset bagging tends to reach the same accuracy as an indivudual non-bagged model after an ensemble size of around 30 and achieves marginally better testing error, improving accuracy by approximately 1%.
+Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. The performance of ensemble classification via a majority voting committee machine for various ensemble sizes is evaluated in figure \ref{fig:bagging-e}. We find that for our dataset bagging tends to reach the same accuracy as an individual non-bagged model after an ensemble size of around 30 and achieves marginally better testing error, improving accuracy by approximately 1%.
\begin{figure}
\begin{center}
@@ -398,7 +399,7 @@ use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zer
\end{figure}
In figure \ref{fig:random-e} we can see the effect of ensemble size when using the biggest
-90 eigenvectors and 70 random eigenvectors. As can be seen from the graph, feature space randomisation is able to increase accuracy by approximately 2% for our data. However, this improvement is dependent on the number of eigenvectors used and the number of random eigenvectors. For example, using a small fully random set of eigenvectors is detrimental to the performance.
+90 eigenvectors and 70 random eigenvectors. As can be seen from the graph, feature space randomisation is able to increase accuracy by approximately 2% for our data. However, this improvement is dependent on the number of eigenvectors used and the number of random eigenvectors. For example, using a small fully random set of eigenvectors is detrimental to the performance (seen on \ref{fig:vaskoplot3}).
We noticed that an ensemble size of around 27 is the point where accuracy or error plateaus. We will use this number when performing an exhaustive search on the optimal randomness parameter.