diff options
-rwxr-xr-x | report/metadata.yaml | 2 | ||||
-rwxr-xr-x | report/paper.md | 10 |
2 files changed, 6 insertions, 6 deletions
diff --git a/report/metadata.yaml b/report/metadata.yaml index b404339..7113dce 100755 --- a/report/metadata.yaml +++ b/report/metadata.yaml @@ -4,7 +4,7 @@ author: - name: Vasil Zlatanov, Nunzio Pucci affilation: Imperial College location: London, UK - email: vz215@ic.ac.uk, np1915@ic.ac.uk + email: CID:01120518, CID:01113180 numbersections: yes lang: en babel-lang: english diff --git a/report/paper.md b/report/paper.md index fc784b7..7fb0961 100755 --- a/report/paper.md +++ b/report/paper.md @@ -230,13 +230,13 @@ affect recognition the most are: glasses, hair, sex and brightness of the pictur To combine both method it is possible to perform LDA in a generative subspace created by PCA. In order to maximize class separation and minimize the distance between elements of the same class it is necessary to -maximize the function J(W) (generalized Rayleigh quotient): $J(W) = \frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{W}W}$ +maximize the function J(W) (generalized Rayleigh quotient): $J(W) = \frac{W\textsuperscript{T}S\textsubscript{B}W}{W\textsuperscript{T}S\textsubscript{W}W}$. With S\textsubscript{B} being the scatter matrix between classes, S\textsubscript{W} being the within-class scatter matrix and W being the set of projection vectors. $\mu$ represents the mean of each class. -It can be proven that when we have a singular S\textsubscript{W} we obtain [@lecture-notes]: $W\textsubscript{opt} = arg\underset{W}max\frac{|W\textsuperscript{T}S\textsubscript{B}W|}{|W\textsuperscript{T}S\textsubscript{W}W|} = S\textsubscript{W}\textsuperscript{-1}(\mu\textsubscript{1} - \mu\textsubscript{2})$ +It can be proven that when we have a singular S\textsubscript{W} we obtain [@lecture-notes]: $W\textsubscript{opt} = arg\underset{W}max\frac{|W\textsuperscript{T}S\textsubscript{B}W|}{|W\textsuperscript{T}S\textsubscript{W}W|} = S\textsubscript{W}\textsuperscript{-1}(\mu\textsubscript{1} - \mu\textsubscript{2})$. However S\textsubscript{W} is often singular since the rank of S\textsubscript{W} is at most N-c and usually N is smaller than D. In such case it is possible to use @@ -258,7 +258,7 @@ small number) are H\textsubscript{pca}(*e*)= Through linear interpolation, for $0\leq t \leq 1$: $F\textsubscript{t}(e)=\frac{1-t}{2} H\textsubscript{pca}(e)+\frac{t}{2}H\textsubscript{lda}(e)= -\frac{1-t}{2}<e,S\textsubscript{e}>+\frac{t}{2}\frac{<e, S\textsubscript{B}e>}{<e,S\textsubscript{W}e> + \epsilon}$ +\frac{1-t}{2}<e,S\textsubscript{e}>+\frac{t}{2}\frac{<e, S\textsubscript{B}e>}{<e,S\textsubscript{W}e> + \epsilon}$. The objective is to find a unit vector *e\textsubscript{t}* in **R**\textsuperscript{n} (with n being the number of samples) such that: $e\textsubscript{t}=arg\underset{et}min F\textsubscript{t}(e)$. @@ -268,7 +268,7 @@ We can model the Lagrange optimization problem under the constraint of ||*e*|| To minimize we take the derivative with respect to *e* and equate L to zero: $\frac {\partial L(e\lambda)}{\partial e}=\frac{\partial F\textsubscript{t}(e)}{\partial e} -+\frac{\partial\lambda(||e||\textsuperscript{2}-1)}{\partial e}=0$ ++\frac{\partial\lambda(||e||\textsuperscript{2}-1)}{\partial e}=0$. Being $\nabla F\textsubscript{t}(e)= (1-t)Se+\frac{t}{<e,S\textsubscript{W}e> +\epsilon}S\textsubscript{B}e-t\frac{<e,S\textsubscript{B}e>}{(<e,S\textsubscript{W} @@ -278,7 +278,7 @@ parallel to *e*. Since S is positive semi-definite, $<\nabla F\textsubscript{t}( It means that $\lambda$ needs to be greater than zero. In such case, normalizing both sides we obtain $\frac{\nabla F\textsubscript{t}(e)}{||\nabla F\textsubscript{t}(e)||}=e$. -We can express *T(e)* as $T(e) = \frac{\alpha e+ \nabla F\textsubscript{t}(e)}{||\alpha e+\nabla F\textsubscript{t}(e)||}$ (adding a positive multiple of *e*, $\alpha e$ to prevent $\lambda$ from vanishing +We can express *T(e)* as $T(e) = \frac{\alpha e+ \nabla F\textsubscript{t}(e)}{||\alpha e+\nabla F\textsubscript{t}(e)||}$ (adding a positive multiple of *e*, $\alpha e$ to prevent $\lambda$ from vanishing). It is then possible to use the gradient descent optimization method to perform an iterative procedure that solves our optimization problem, using e\textsubscript{n+1}=T(e\textsubscript{n}) and updating |