# Question 1, Eigenfaces The data is partitioned to allow random selection of the same amount of samples for each class. This is done to prevent overfitting (?) of some classes with respect to others. In such way, each training vector space will be generated with the same amount of elements. The test data will instead be taken from the remaining samples. Testing on accuracy with respect to data partition indicates that the maximum accuracy is obtained when using a 90% of the data for training. Despite such results we will be using 80% of the data for training as a standard. This will allow to give more than one example of success and failure for each class when classifying the test_data. ![Classification Accuracy of Test Data vs % of data used for training](fig/partition.pdf "Partition") After partitioning the data into training and testing sets, PCA is applied. The covariance matrix, S, of dimension 2576x2576 (features x features), will have 2576 eigenvalues and eigenvectors. The amount of non-zero eigenvalues and eigenvectors obtained will only be equal to the amount of training samples minus one. This can be observed in the graph below as a sudden drop for eigenvalues after the 415th. ![Log PCA Eigenvalues](fig/eigenvalues.pdf "Eigenvalues") The mean image is calculated averaging the features of the training data. Changing the randomization seed will give very similar values, since the vast majority of the training faces used for averaging will be the same. The mean face for our standard seed can be observed below. ![Mean Face](fig/mean_face.pdf){ width=1em } To perform face recognition we choose the best M eigenvectors associated with the largest eigenvalues. We tried different values of M, and we found an optimal point for M=42 with accuracy=66.3%. After such value the accuracy starts to flaten, with some exceptions for points at which accuracy decreases. WE NEED TO ADD PHYSICAL MEANINGS ![Recognition Accuracy of Test data varying M](fig/accuracy.pdf "Accuracy1") # Question 1, Application of eigenfaces rming the low-dimensional computation of the eigenspace for PCA we obtain the same accuracy results of the high-dimensional computation previously used. A comparison between eigenvalues and eigenvectors of the two computation techniques used shows that the difference is very small. The difference we observed is due to rounding of the np.eigh function when calculating the eigenvalues and eigenvectors of the matrices ATA (DxD) and AAT (NxN). The first ten biggest eigenvalues obtained with each method are shown in the table below. \begin{table}[ht] \centering \begin{tabular}[t]{cc} PCA &Fast PCA\\ 2.9755E+05 &2.9828E+05\\ 1.4873E+05 &1.4856E+05\\ 1.2286E+05 &1.2259E+05\\ 7.5084E+04 &7.4950E+04\\ 6.2575E+04 &6.2428E+04\\ 4.7024E+04 &4.6921E+04\\ 3.7118E+04 &3.7030E+04\\ 3.2101E+04 &3.2046E+04\\ 2.7871E+04 &2.7814E+04\\ 2.4396E+04 &2.4339E+04\\ \end{tabular} \caption{Comparison of eigenvalues obtain with the two computation methods} \end{table} It can be proven that the eigenvalues and eigenvectors obtain are the same: ##PROVE Using the computational method for fast PCA, face reconstruction is then performed. The quality of reconstruction will depend on the amount of eigenvectors picked. The results of varying M can be observed in the picture below. A face from class number 21 is reconstructed as shown below withrespective M values of M=10, M=100, M=200, M=300. The last picture is the original face. ![Reconstructed Face](fig/face160rec.pdf) It is already observable that the improvement in reconstruction is marginal for M=200 and M=300. For such reason choosing M close to 100 is good enough for such purpose. IT HAS TO BE DONE FOR MORE FACE IMAGES # Cites # Conclusion # References