Improvements to section 1

author: Vasil Zlatanov <v@skozl.com> 2018-11-20 16:24:14 +0000
committer: Vasil Zlatanov <v@skozl.com> 2018-11-20 16:24:14 +0000
commit: 61c6abf5658025ef24255f827797e44c3922a043 (patch)
tree: 1afab76177cae2b9e4fcb98a3d9bf38a0f1ac135 /report/paper.md
parent: daa03cd78886729e2c54c50a04addefc3e60eb8b (diff)
download: vz215_np1915-61c6abf5658025ef24255f827797e44c3922a043.tar.gz
vz215_np1915-61c6abf5658025ef24255f827797e44c3922a043.tar.bz2
vz215_np1915-61c6abf5658025ef24255f827797e44c3922a043.zip
1 files changed, 39 insertions, 45 deletions
diff --git a/report/paper.md b/report/paper.md
index 7fb0961..bcb2386 100755
--- a/report/paper.md
+++ b/report/paper.md
@@ -2,22 +2,17 @@
 
 ## Partition and Standard PCA
 
-The data is partitioned to allow random selection of the
-same amount of samples for each class.
-In such way, each training vector space will be generated with
-the same amount of elements. The test data will instead
-be taken from the remaining samples. Testing on accuracy
-with respect to data partition indicates that the maximum
-accuracy is obtained when using 90% of the data for
-training. Despite such results we will be using 70% of the data
-for training as a standard. This will allow to give more than one
-example of success and failure for each class when classifying the 
-test_data. Moreover using 90% training data would make the results
-obtained heavilly dependent on the seed chosen.
+The data is partitioned such that there is an equal amount of training samples in each class. As each class has an identical number of samples.
+In this way, each training vector space is generated with
+the same number elements. The test data is taken from the remaining samples. 
+We will be using 70% of the data for training, as 80% and 90% splits give misleadingly large and variant accuracies based on the random seed used.
+This also allows the observation of more than one
+success and failure case for each class when classifying the 
+test data. 
 
 After partitioning the data into training and testing sets,
 PCA is applied. The covariance matrix, S, of dimension
-2576x2576 (features x features), will have 2576 eigenvalues
+2576x2576 (features x features), has 2576 eigenvalues
 and eigenvectors. The amount of non-zero eigenvalues and
 eigenvectors obtained will only be equal to the amount of
 training samples minus one. This can be observed in figure \ref{fig:logeig}
@@ -31,11 +26,12 @@ as a sudden drop for eigenvalues after the 363rd.
 \end{center}
 \end{figure}
 
-The mean image is calculated averaging the features of the
-training data. Changing the randomization seed will give
+The mean image is calculated by averaging the features of the
+training data. Changing the randomisation seed gives
 very similar values, since the vast majority of the training
-faces used for averaging will be the same. Two mean faces
-obtained with different seeds for split can be observed in figure \ref{fig:mean_face}.
+faces used for averaging are the same. Two mean faces
+obtained with different seeds for split can be seen in 
+figure \ref{fig:mean_face}.
 
 \begin{figure}
 \begin{center}
@@ -46,11 +42,10 @@ obtained with different seeds for split can be observed in figure \ref{fig:mean_
 \end{center}
 \end{figure}
 
-To perform face recognition we choose the best M eigenvectors
-associated with the largest eigenvalues. We tried
-different values of M, and we found an optimal point for
-M=99 with accuracy=57%. After such value the accuracy starts 
-to flaten. 
+To perform face recognition best M eigenvectors associated with the 
+largest eigenvalues are chosen. We found that the opimal value for M
+when when performing PCA is $M=99$ with an accuracy of 57%. For larger M
+the accuracy plateaus.
 
 \begin{figure}
 \begin{center}
@@ -64,54 +59,51 @@ to flaten.
 
 Performing the low-dimensional computation of the
 eigenspace for PCA we obtain the same accuracy results
-of the high-dimensional computation previously used. A
+as the high-dimensional computation previously used. A
 comparison between eigenvalues of the
 two computation techniques used shows that the difference
 is very small (due to rounding
-of the np.eigh function when calculating the eigenvalues
+of the `numpy.eigh` function when calculating the eigenvalues
 and eigenvectors of the matrices A\textsuperscript{T}A (NxN) and AA\textsuperscript{T}
 (DxD)). The first ten biggest eigenvalues obtained with each method
 are shown in Appendix, table \ref{tab:eigen}.
 
 It can be proven that the eigenvalues obtained are mathematically the same [@lecture-notes],
-and the there is a relation between the eigenvectors obtained: $\boldsymbol{u\textsubscript{i}} = A\boldsymbol{v\textsubscript{i}}$. (*Proof in the appendix*). 
+and the there is a relation between the eigenvectors obtained: $\boldsymbol{u\textsubscript{i}} = A\boldsymbol{v\textsubscript{i}}$. (*Proof in appendix A*). 
 
-It can be noticed that we effectively don't lose any data calculating the eigenvectors
-for PCA with the second method. The main advantages of it are in terms of speed,
+Experimentally there is no consequential loss of data calculating the eigenvectors
+for PCA when using the low dimensional method. The main advantages of it are reduced computation time,
 (since the two methods require on average respectively 3.4s and 0.11s), and complexity of computation
 (since the eigenvectors found with the first method are extracted from a significantly 
 bigger matrix).
 
-The only drawback is that with method 1 the eigenfaces are generated directly through 
-the covariance matrix, whereas method 2 requires an additional projection step.
+The drawback of the low-dimensional computation technique is that we include and extra projection step, and as a result do not obtain Hermitian matrix which otherwise simplifies eigenvector computation.
 
 # Question 1, Application of eigenfaces
 
 ## Image Reconstruction
 
-Using the computational method for fast PCA, face reconstruction is then performed.
-The quality of reconstruction will depend on the amount of eigenvectors picked.
-The results of varying M can be observed in fig.\ref{fig:face160rec}. Two faces from classes 
-number 21 and 2 respectively, are reconstructed as shown in fig.\ref{fig:face10rec} with respective M values 
-of M=10, M=100, M=200, M=300. The last picture is the original face.
+Face reconstruction is performed with the faster low-dimensional PCA computation.
+The quality of reconstruction depends on the amount of eigenvectors used.
+The results of varying the number of eigenvectors $M$ can be observed in fig.\ref{fig:face160rec}. 
+Two faces from classes number 21 and 2 respectively, are reconstructed as shown 
+in fig.\ref{fig:face10rec} with respective $M$ values of $M=10, M=100, M=200, M=300$. The rightmost picture is the original face.
 
 ![Reconstructed Face C21\label{fig:face160rec}](fig/face160rec.pdf)
 
 ![Reconstructed Face C2\label{fig:face10rec}](fig/face10rec.pdf)
 
-It is already observable that the improvement in reconstruction is marginal for M=200 
-and M=300. For such reason choosing M close to 100 is good enough for such purpose.
-Observing in fact the variance ratio of the principal components, the contribution
-they'll have will be very low for values above 100, hence we will require a much higher
-quantity of components to improve reconstruction quality. With M=100 we will be able to
-use effectively 97% of the information from our initial training data for reconstruction.
+It is visible that the improvement in reconstruction is marginal for M=200 
+and M=300. For this reason choosing $M$ larger than 100 gives very marginal returns.
+This is evident when looking at the variance ratio of the principal components, as the contribution they have is very low for values above 100.
+With M=100 we are be able to reconstruct effectively 97% of the information from our initial training data.
 Refer to figure \ref{fig:eigvariance} for the data variance associated with each of the M
 eigenvalues.
 
 \begin{figure}
 \begin{center}
 \includegraphics[width=17em]{fig/variance.pdf}
-\caption{Data variance carried by each of M eigenvalues}
+\caption{Data variance carried by each of $M$ eigenvalues}
 \label{fig:eigvariance}
 \end{center}
 \end{figure}
@@ -477,7 +469,7 @@ Seed & Individual$(M=120)$ & Bag + Feature Ens.$(M=60+95)$\\ \hline
 
 ## Eigenvectors and Eigenvalues in fast PCA
 
-**Table showing eigenvalues obtained with each method**
+### Table showing eigenvalues obtained with each method**
 
 \begin{table}[ht]
 \centering
@@ -498,7 +490,7 @@ PCA &Fast PCA\\ \hline
 \label{tab:eigen}
 \end{table}
 
-**Proof of relationship between eigenvalues and eigenvectors in the different methods**
+### Proof of relationship between eigenvalues and eigenvectors in the different methods
 
 Computing the eigenvectors **u\textsubscript{i}** for the DxD matrix AA\textsuperscript{T} 
 we obtain a very large matrix. The computation process can get very expensive when $D \gg N$.
@@ -514,6 +506,8 @@ We know that $S\boldsymbol{u\textsubscript{i}} = \lambda \textsubscript{i}\bolds
  
 From here it follows that AA\textsuperscript{T} and A\textsuperscript{T}A have the same eigenvalues and their eigenvectors follow the relationship $\boldsymbol{u\textsubscript{i}} = A\boldsymbol{v\textsubscript{i}}$ 
 
-# Code
+## Code
 
-All code and \LaTeX sources are available at [https://git.skozl.com/e4-pattern/](https://git.skozl.com/e4-pattern/).
+All code and \LaTeX sources are available at:
+
+[https://git.skozl.com/e4-pattern/](https://git.skozl.com/e4-pattern/).
author	Vasil Zlatanov <v@skozl.com>	2018-11-20 16:24:14 +0000
committer	Vasil Zlatanov <v@skozl.com>	2018-11-20 16:24:14 +0000
commit	61c6abf5658025ef24255f827797e44c3922a043 (patch)
tree	1afab76177cae2b9e4fcb98a3d9bf38a0f1ac135 /report/paper.md
parent	daa03cd78886729e2c54c50a04addefc3e60eb8b (diff)
download	vz215_np1915-61c6abf5658025ef24255f827797e44c3922a043.tar.gz vz215_np1915-61c6abf5658025ef24255f827797e44c3922a043.tar.bz2 vz215_np1915-61c6abf5658025ef24255f827797e44c3922a043.zip