diff options
-rwxr-xr-x | report/paper.md | 54 |
1 files changed, 54 insertions, 0 deletions
diff --git a/report/paper.md b/report/paper.md index d887919..b8f444e 100755 --- a/report/paper.md +++ b/report/paper.md @@ -387,7 +387,61 @@ the 3 features of the subspaces obtained are graphed. # Question 3, LDA Ensemble for Face Recognition, PCA-LDA Ensemble +So far we have established a combined PCA-LDA model which has good recognition while maintaining relatively low execution times and looked at varying hyperparameters. +## Committee Machine Design + +Since each model in the ensemble outputs its own predicted labels, we need to defined a strategy for combining the predictions such that we obtain a combined response which is better than that of an individual model. For this project, we consider two committee machine designs. + +### Majority Voting + +In simple majority voting we the committee label is the most popular label outputted by all the models. This can be achieved by binning all labels produced by the ensemble of models and classifying the test case as the class with the most bins. + +This technique does is not bias towards statistically better models and values all models in the ensemble equally. It is useful when models have similar accuracies and our not specialised in classifying in their classification. + +### Confidence Weighted Averaging + +Given that the model can output confidence about the label it is able to predict, we can factor the confidence of the model towards the final output of the committee machine. For instance, if a specialised model says with 95% confidence the label for the test case is "A", and two other models only classify it as "B" with 40% confidence, we would be inclined to trust the first model and classify the result as "A". + +This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K \gt 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively. + +In our testing we have elected to use a committee machine employing majority voting, as we identified that looking a nearest neighbour strategy with only **one** neighbour ($K=1$) performed best. + + +## Data Randomisation (Bagging) + +The first strategy which we may use when using ensemble learning is randomisation of the data, while maintaining the model static. + +Bagging is performed by generating each dataset for the ensembles by randomly picking with replacement. We chose to perform bagging independently for each face such that we can maintain the split training and testing split ratio used with and without bagging. + +![Ensemble size effect on accuracy with bagging\label{fig:bagging-e}](fig/bagging.pdf) + + +## Feature Space Randomisation + +Feature space randomisations involves randomising the features which are analysed by the model. In the case of PCA-LDA this can be achieved by randomising the eigenvectors used when performing the PCA step. For instance, instead of choosing the most variant 120 eigenfaces, we may chose to use the 90 eigenvectors with biggest variance and picking 70 of the rest non-zero eigenvectors randomly. + +![Ensemble size effect on accraucy with 160 eeigen values (m_c=90,m_r=70\label{fig:random-e}](fig/random-ensemble.pdf) + +In figure \ref{fig:random-e} we can see the effect of ensemble size when using the bigget 90 eigenvalues and 70 random eigenvalues. + +We noticed that an ensemble size of around 27 is the point wher accuracy or error plateues. We will use this number when performing an exhaustive search on the optimal randomness parameter. + +### Optimal randomness hyper-parameter + +The randomness hyper-parameter regarding feature space randomsiation can be defined as the number of features we chose to randomise. For instance the figure \ref{fig:random-e} we chose 70 out of 160 eigenvalues to be random. We could chose to use more than 70 random eigenvalues, thereby increasing the randomness. Conversly we could decrease the randomness parameter, randomising less of the eigenvectors. + +The optimal number of constant and random eigenvectors to use is therefore an interesting question. + +The optimal randomness after doing an exhaustive search peaks at 95 randomised eigenvalues out of 155 total eigenvalues, or 60 static and 95 random eigenvalues. + +## Comparison + +Combining bagging and feature space we are able to achieve higher test accuracy then individual model. + +### Ensemmble Confusion Matrix + +![Ensemble confusion matrix\label{fig:ens-cm}](fig/ensemble-cm.pdf) # References |