diff options
author | Vasil Zlatanov <v@skozl.com> | 2018-11-20 12:26:20 +0000 |
---|---|---|
committer | Vasil Zlatanov <v@skozl.com> | 2018-11-20 12:26:20 +0000 |
commit | 40bff320aacdda8d577a7ebd9a74aeae45523ed8 (patch) | |
tree | c26c5691e705bb10f6b6c2eaf78ab252048e265c | |
parent | fcc4990e364ab0df19cec513cda90f3f49e2efae (diff) | |
download | vz215_np1915-40bff320aacdda8d577a7ebd9a74aeae45523ed8.tar.gz vz215_np1915-40bff320aacdda8d577a7ebd9a74aeae45523ed8.tar.bz2 vz215_np1915-40bff320aacdda8d577a7ebd9a74aeae45523ed8.zip |
Add references (and minor grammer
-rw-r--r-- | report/bibliography.bib | 27 | ||||
-rwxr-xr-x | report/paper.md | 22 |
2 files changed, 30 insertions, 19 deletions
diff --git a/report/bibliography.bib b/report/bibliography.bib index 5bee281..5c58f17 100644 --- a/report/bibliography.bib +++ b/report/bibliography.bib @@ -1,10 +1,21 @@ -@misc{djangoproject_models_2016, - title = {Models and Databases | {{Django}} Documentation | {{Django}}}, - timestamp = {2016-12-19T03:31:30Z}, - urldate = {2016-12-19}, - howpublished = {\url{https://docs.djangoproject.com/en/1.10/topics/db/}}, - author = {{djangoproject}}, - month = dec, - year = {2016} +@misc{lecture-notes, + title = {EE4-68 Pattern Recognition Lecture Notes}, + organization = {{ Imperial College London }}, + timestamp = {2018-12-20T03:31:30Z}, + urldate = {2018-12-19}, + author = {Tae-Kyun Kim}, + year = {2018}, } +@INPROCEEDINGS{pca-lda, +author={N. Zhao and W. Mio and X. Liu}, +booktitle={The 2011 International Joint Conference on Neural Networks}, +title={A hybrid PCA-LDA model for dimension reduction}, +year={2011}, +volume={}, +number={}, +pages={2184-2190}, +keywords={data analysis;learning (artificial intelligence);principal component analysis;hybrid {PCA-LDA} model;linear discriminant analysis;within-class scatter under projection;low-dimensional subspace;principal component analysis;discrimination performance;hybrid dimension reduction model;dimension reduction algorithm;face recognition;Principal component analysis;Data models;Training;Cost function;Vectors;Computational modeling;Training data}, +doi={10.1109/IJCNN.2011.6033499}, +ISSN={2161-4407}, +month={July},} diff --git a/report/paper.md b/report/paper.md index e67b3d0..5e20702 100755 --- a/report/paper.md +++ b/report/paper.md @@ -94,11 +94,11 @@ PCA &Fast PCA\\ \label{tab:eigen} \end{table} -It can be proven that the eigenvalues obtained are mathematically the same, +It can be proven that the eigenvalues obtained are mathematically the same [@lecture-notes], and the there is a relation between the eigenvectors obtained: Computing the eigenvectors **u\textsubscript{i}** for the DxD matrix AA\textsuperscript{T} -we obtain a very large matrix. The computation process can get very expensive when D>>N. +we obtain a very large matrix. The computation process can get very expensive when $D>>N$. For such reason we compute the eigenvectors **v\textsubscript{i}** of the NxN matrix A\textsuperscript{T}A. From the computation it follows that $A\textsuperscript{T}A\boldsymbol{v\textsubscript{i}} = \lambda \textsubscript{i}\boldsymbol{v\textsubscript{i}}$. @@ -285,10 +285,10 @@ of the projected samples: $W\textsuperscript{T}\textsubscript{pca} = arg\underse = arg\underset{W}max\frac{|W\textsuperscript{T}W\textsuperscript{T} \textsubscript{pca}S\textsubscript{B}W\textsubscript{pca}W|}{|W\textsuperscript{T}W\textsuperscript{T}\textsubscript{pca}S\textsubscript{W}W\textsubscript{pca}W|}$. -Anyways performing PCA followed by LDA carries a loss of discriminative information. Such problem can -be avoided by a linear combination of the two. In the following section we will use a 1-dimensional -subspace *e*. The cost functions associated with PCA and LDA (with $\epsilon$ being a very small number) -are H\textsubscript{pca}(*e*)= +However, performing PCA followed by LDA carries a loss of discriminative information. This problem can +be avoided through a linear combination of the two [@pca-lda]. In the following section we will use a +1-dimensional subspace *e*. The cost functions associated with PCA and LDA (with $\epsilon$ being a very +small number) are H\textsubscript{pca}(*e*)= <*e*, S\textsubscript{e}> and $H\textsubscript{lda}(e)=\frac{<e, S\textsubscript{B}e>} {<e,(S\textsubscript{W} + \epsilon I)e>}= \frac{<e, S\textsubscript{B}e>}{<e,S\textsubscript{W}e> + \epsilon}$. @@ -403,13 +403,13 @@ Since each model in the ensemble outputs its own predicted labels, we need to de ### Majority Voting -In simple majority voting we the committee label is the most popular label outputted by all the models. This can be achieved by binning all labels produced by the ensemble of models and classifying the test case as the class with the most bins. +In simple majority voting the comitee label is the most pouplar label given by them models. This can be achieved by binning all labels produced by the ensemble and classifying the test case as the class with the most bins. -This technique does is not bias towards statistically better models and values all models in the ensemble equally. It is useful when models have similar accuracies and our not specialised in classifying in their classification. +This technique is not bias towards statistically better models and values all models in the ensemble equally. It is useful when models have similar accuracies and are not specialised in their classification. ### Confidence Weighted Averaging -Given that the model can output confidence about the label it is able to predict, we can factor the confidence of the model towards the final output of the committee machine. For instance, if a specialised model says with 95% confidence the label for the test case is "A", and two other models only classify it as "B" with 40% confidence, we would be inclined to trust the first model and classify the result as "A". +Given that the model can output confidences about the labels it predicts, we can factor the confidence of the model towards the final output of the committee machine. For instance, if a specialised model says with 95% confidence the label for the test case is "A", and two other models only classify it as "B" with 40% confidence, we would be inclined to trust the first model and classify the result as "A". This technique is reliant on the model producing a confidence score for the label(s) it guesses. For K-Nearest neighbours where $K > 1$ we may produce a confidence based on the proportion of the K nearest neighbours which are the same class. For instance if $K = 5$ and 3 out of the 5 nearest neighbours are of class "C" and the other two are class "B" and "D", then we may say that the predictions are classes C, B and D, with confidence of 60%, 20% and 20% respectively. @@ -438,7 +438,7 @@ Feature space randomisations involves randomising the features which are analyse \begin{figure} \begin{center} \includegraphics[width=19em]{fig/random-ensemble.pdf} -\caption{Ensemble size effect on accraucy with 160 eigenvalues (mc=90,mr=70)} +\caption{Ensemble size effect on accraucy with 160 eigenvalues ($m_c=90$,$m_r=70$)} \label{fig:random-e} \end{center} \end{figure} @@ -472,7 +472,7 @@ Combining bagging and feature space randomization we are able to achieve higher \begin{figure} \begin{center} \includegraphics[width=19em]{fig/ensemble-cm.pdf} -\caption{Ensemble confusion matrix} +\caption{Ensemble confusion matrix (pre-comittee)} \label{fig:ens-cm} \end{center} \end{figure} |