Rewrite Part 2

author: nunzip <np.scarh@gmail.com> 2018-11-12 22:26:49 +0000
committer: nunzip <np.scarh@gmail.com> 2018-11-12 22:26:49 +0000
commit: 721dbfa8a3682289a8199083e2e5e06a618b975a (patch)
tree: d4d20a81d01c697b00bba1b45b76efe8a4f274dd /report/paper.md
parent: c8a931059adc4bf60ddd1ba0dab1c24b00a75b0f (diff)
download: vz215_np1915-721dbfa8a3682289a8199083e2e5e06a618b975a.tar.gz
vz215_np1915-721dbfa8a3682289a8199083e2e5e06a618b975a.tar.bz2
vz215_np1915-721dbfa8a3682289a8199083e2e5e06a618b975a.zip
1 files changed, 93 insertions, 17 deletions
diff --git a/report/paper.md b/report/paper.md
index c84530e..d286b08 100755
--- a/report/paper.md
+++ b/report/paper.md
@@ -12,7 +12,9 @@ for training as a standard. This will allow to give more than one
 example of success and failure for each class when classifying the 
 test_data.
 
-![Classification Accuracy of Test Data vs % of data used for training](fig/partition.pdf "Partition")
+\begin{center}
+\includegraphics[width=20em]{fig/partition.pdf}
+\end{center}
 
 After partitioning the data into training and testing sets,
 PCA is applied. The covariance matrix, S, of dimension
@@ -23,15 +25,19 @@ training samples minus one. This can be observed in the
 graph below as a sudden drop for eigenvalues after the
 415th.
 
-![Log PCA Eigenvalues](fig/eigenvalues.pdf "Eigenvalues")
+\begin{center}
+\includegraphics[width=20em]{fig/eigenvalues.pdf}
+\end{center}
 
 The mean image is calculated averaging the features of the
 training data. Changing the randomization seed will give
 very similar values, since the vast majority of the training
 faces used for averaging will be the same. The mean face
 for our standard seed can be observed below.
-
-![Mean Face](fig/mean_face.pdf){ width=1em }
+	
+\begin{center}
+\includegraphics[width=8em]{fig/mean_face.pdf}
+\end{center}
 
 To perform face recognition we choose the best M eigenvectors
 associated with the largest eigenvalues. We tried
@@ -40,7 +46,9 @@ M=42 with accuracy=66.3%. After such value the accuracy starts
 to flaten, with some exceptions for points at which accuracy decreases. 
 WE NEED TO ADD PHYSICAL MEANINGS
 
-![Recognition Accuracy of Test data varying M](fig/accuracy.pdf "Accuracy1")
+\begin{center}
+\includegraphics[width=20em]{fig/accuracy.pdf}
+\end{center}
 
 # Question 1, Application of eigenfaces
 
@@ -86,9 +94,9 @@ we obtain a very large matrix. The computation process can get very expensive wh
 For such reason we compute the eigenvectors **v\textsubscript{i}** of the NxN
 matrix A\textsuperscript{T}A. From the computation it follows that $$ A\textsuperscript{T}A\boldsymbol{v\textsubscript{i}} = \lambda \textsubscript{i}\boldsymbol{v\textsubscript{i}} $$
 
-Multiplying both side by A we obtain: $$ AA\textsuperscript{T}A\boldsymbol{v\textsubscript{i}} = \lambda \textsubscript{i}A\boldsymbol{v\textsubscript{i}} $$
+Multiplying both side by A we obtain: 
 
-$$ SA\boldsymbol{v\textsubscript{i}} = \lambda \textsubscript{i}A\boldsymbol{v\textsubscript{i}} $$
+$$ AA\textsuperscript{T}A\boldsymbol{v\textsubscript{i}} = \lambda \textsubscript{i}A\boldsymbol{v\textsubscript{i}} \rightarrow SA\boldsymbol{v\textsubscript{i}} = \lambda \textsubscript{i}A\boldsymbol{v\textsubscript{i}} $$
 
 We know that $$ S\boldsymbol{u\textsubscript{i}} = \lambda \textsubscript{i}\boldsymbol{u\textsubscript{i}} $$
  
@@ -110,7 +118,9 @@ Observing in fact the variance ratio of the principal components, the contributi
 they'll have will be very low for values above 100, hence we will require a much higher
 quantity of components to improve reconstruction quality.
 
-![Variance Ratio](fig/variance.pdf)
+\begin{center}
+\includegraphics[width=20em]{fig/variance.pdf}
+\end{center}
 
 The analysed classification methods used for face recognition are **Nearest Neighbor** and
 **alternative method** through reconstruction error. 
@@ -121,58 +131,124 @@ REFER TO ACCURACY GRAPH 1 FOR NN. MAYBE WE CAN ALSO ADD SAME GRAPH WITH DIFFEREN
 A confusion matrix showing success and failure cases for Nearest Neighbor classfication
 can be observed below:
 
-![Confusion Matrix NN, K=1](fig/pcacm.pdf)
+\begin{center}
+\includegraphics[width=20em]{fig/pcacm.pdf}
+%![Confusion Matrix NN, K=1](fig/pcacm.pdf)
+\end{center}
 
 An example of failed classification is a test face from class 2, wrongly labeled as class 5:
 
-![Class 2 (left) labeled as class 5 (right)](fig/failure_2_5.pdf)
+\begin{center}
+\includegraphics[width=20em]{fig/failure_2_5.pdf}
+%![Class 2 (left) labeled as class 5 (right)](fig/failure_2_5.pdf)
+\end{center}
 
 The alternative method shows overall a better performance, with peak accuracy of 73%
 for M=3. The maximum M non zero eigenvectors that can be used will in this case be at most
 the amount of training samples per class minus one, since the same amount of eigenvectors
 will be used for each generated class-subspace.
 
-![Accuracy of Alternative Method varying M](fig/alternative_accuracy.pdf)
+\begin{center}
+\includegraphics[width=20em]{fig/alternative_accuracy.pdf}
+%![Accuracy of Alternative Method varying M](fig/alternative_accuracy.pdf)
+\end{center}
 
 A confusion matrix showing success and failure cases for alternative method classfication
 can be observed below:
 
-![Confusion Matrix alternative method, M=3](fig/altcm.pdf) 
+\begin{center}
+\includegraphics[width=20em]{fig/altcm.pdf}
+%![Confusion Matrix alternative method, M=3](fig/altcm.pdf) 
+\end{center}
 
 It can be observed that even with this more accurate classification, there is one 
 instance of mislabel of the same face of class 2 as class 5. An additional classification 
 failure of class 6 labeled as class 7 can be observed below:
 
-![Class 6 (left) labeled as class 7 (right)](fig/failure_6_7.pdf) 
+\begin{center}
+\includegraphics[width=20em]{fig/failure_6_7.pdf}
+%![Class 6 (left) labeled as class 7 (right)](fig/failure_6_7.pdf) 
+\end{center}
 
-# Part 2 
+# Question 2, Generative and Discriminative Subspace Learning
 
 Maximize function J(W) (Fisher's Criterion): 
+
 $$ J(W) = \frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{W}W}\textrm{  or  } J(W) = \frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{t}W}$$ 
+
 With S\textsubscript{B} being the scatter matrix between classes, S\textsubscript{W} 
 being the within-class scatter matrix and W being the set of projection vectors. $\mu$ 
 represents the mean vector(???) of each class.
+
 $$ S\textsubscript{B} = \sum\limits_{c}(\mu\textsubscript{c} - \overline{x})(\mu\textsubscript{c} - \overline{x})\textsuperscript{T} $$
-$$ S\textsubscript{W} = \sum\limits_{c}\sum\limits_{i\epsilon c}(x\textsubscript{i} - \mu\textsubscript{c})(x\textsubscript{i} - \mu\textsubscript{c})\textsuperscript{T} $$
+$$ S\textsubscript{W} = \sum\limits_{c}\sum\limits_{i\in c}(x\textsubscript{i} - \mu\textsubscript{c})(x\textsubscript{i} - \mu\textsubscript{c})\textsuperscript{T} $$
+
 To maximize J(W) we differentiate with respect to W and equate to zero:
+
 $$ \frac{d}{dW}J(W) = \frac{d}{dW}(\frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{W}W}) = 0 $$
 $$ (W\textsuperscript{T}S\textsubscript{W}W)\frac{d(W\textsuperscript{T}S\textsubscript{B}W)}{dW} -  (W\textsuperscript{T}S\textsubscript{B}W)\frac{d(W\textsuperscript{T}S\textsubscript{W}W)}{dW} = 0 $$
 $$ (W\textsuperscript{T}S\textsubscript{W}W)2S\textsubscript{B}W - (W\textsuperscript{T}S\textsubscript{B}W)2S\textsubscript{W}W = 0 $$ 
 $$ S\textsubscript{B}W - JS\textsubscript{W}W = 0 $$
+
 Multiplying by the inverse of S\textsubscript{W} we obtain:
+
 $$ S\textsubscript{W}\textsuperscript{-1}S\textsubscript{B}W - JW = 0 $$
+
 From here it follows:
-$$ W\textsuperscript{*} = argmax(\frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{W}W}) = S\textsubscript{W}\textsuperscript{-1}(\mu\textsubscript{1} - \mu\textsubscript{2}) $$
+
+$$ W\textsuperscript{*} = arg\underset{W}max(\frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{W}W}) = S\textsubscript{W}\textsuperscript{-1}(\mu\textsubscript{1} - \mu\textsubscript{2}) $$
+
 By isomorphic mapping where P are the eigenvectors generated through PCA:
+
 $$ W = PX $$
+
 We can substitute for W in the J(W) expression, obtaining:
+
 $$ J(W) = \frac{X\textsuperscript{T}P\textsuperscript{T}S\textsubscript{B}PX}{X\textsuperscript{T}P\textsuperscript{T}S\textsubscript{t}PX} $$
+
 We can rewrite such expression substituting for:
+
 $$ P\textsuperscript{T}S\textsubscript{B}P = \widetilde{S}\textsubscript{B} \textrm{  and  } P\textsuperscript{T}S\textsubscript{t}P = \widetilde{S}\textsubscript{t} $$
 $$ J(W) = \widetilde{J}(W) = \frac{X\textsuperscript{T}\widetilde{S}\textsubscript{B}X}{X\textsuperscript{T}\widetilde{S}\textsubscript{t}X} $$
 
-# Conclusion
+$$ \widetilde{S}\textsubscript{B} \textrm{  and  } \widetilde{S}\textsubscript{t} $$ 
+
+are respectively semi-positive definite and positive definite. So $$ \widetilde{J}(X) $$ 
+acts like Fisher's criterion but in PCA transformed space. This method 
+does not result in any loss of data.
+
+*Proof:*
+
+The set of optimal discriminant vectors can be found in R\textsuperscript{n}
+for LDA. But, this is a difficult computation because the dimension is very 
+high. Besides, S\textsubscript{t} is always singular. Fortunately, it is possible 
+to derive it to a lower dimensional space.
+
+Suppose **b\textsubscript{n}** are the eigenvectors of S\textsubscript{t}. 
+The M biggest eigenvectors are positive: M = *rank*(S\textsubscript{t}). 
+The other M+1 to compose the null space of S\textsubscript{t}. 
+
+YO WTF(???)
+
+**Theorem:**
+*$$ \textrm{For any arbitrary } \varphi \in R\textsuperscript{n}, \varphi 
+\textrm{ can be denoted by } \varphi = X + \xi, $$ 
+$$ \textrm{ where, }X \in \phi\textsubscript{t}\textrm{ and } \xi \in 
+\phi\textsubscript{t}\textsuperscript{perp}\textrm{, and
+satisfies }J(\varphi)=J(X). $$*
+
+According to the theorem, it is possible to find optimal discriminant
+vectors, reducing the problem dimension without any loss of information
+with respect to Fisher’s criterion.
+
+Fisherfaces method is effective because it requires less computation 
+time than regular LDA. It also has lower error rates compared to 
+Eigenfaces method. Thus, it combines the performances of discriminative 
+and reconstructive tools. 
+
+# Question 3, LDA Ensemble for Face Recognition, PCA-LDA 
 
+# Question 3, LDA Ensemble for Face Recognition, PCA-LDA Ensemble 
 
 # References
author	nunzip <np.scarh@gmail.com>	2018-11-12 22:26:49 +0000
committer	nunzip <np.scarh@gmail.com>	2018-11-12 22:26:49 +0000
commit	721dbfa8a3682289a8199083e2e5e06a618b975a (patch)
tree	d4d20a81d01c697b00bba1b45b76efe8a4f274dd /report/paper.md
parent	c8a931059adc4bf60ddd1ba0dab1c24b00a75b0f (diff)
download	vz215_np1915-721dbfa8a3682289a8199083e2e5e06a618b975a.tar.gz vz215_np1915-721dbfa8a3682289a8199083e2e5e06a618b975a.tar.bz2 vz215_np1915-721dbfa8a3682289a8199083e2e5e06a618b975a.zip