Getting there

author: Vasil Zlatanov <v@skozl.com> 2019-03-20 22:47:53 +0000
committer: Vasil Zlatanov <v@skozl.com> 2019-03-20 22:47:53 +0000
commit: a283b7fb373851d42113f2a44203ffeb8777f039 (patch)
tree: fdc9651bf8d8e7bb747a72a280d9ed431ef20b20
parent: d8f9a4863c5a2002495fd698cc1d9517254f28c4 (diff)
download: e3-deep-a283b7fb373851d42113f2a44203ffeb8777f039.tar.gz
e3-deep-a283b7fb373851d42113f2a44203ffeb8777f039.tar.bz2
e3-deep-a283b7fb373851d42113f2a44203ffeb8777f039.zip
6 files changed, 82 insertions, 245 deletions
diff --git a/report/bibliography.bib b/report/bibliography.bib
index 0617fac..ddcf6c4 100644
--- a/report/bibliography.bib
+++ b/report/bibliography.bib
@@ -1,3 +1,25 @@
+@misc{defense,
+Author = {Alexander Hermans and Lucas Beyer and Bastian Leibe},
+Title = {In Defense of the Triplet Loss for Person Re-Identification},
+Year = {2017},
+Eprint = {arXiv:1703.07737},
+}
+
+@misc{hardnet,
+Author = {Anastasiya Mishchuk and Dmytro Mishkin and Filip Radenovic and Jiri Matas},
+Title = {Working hard to know your neighbor's margins: Local descriptor learning loss},
+Year = {2017},
+Eprint = {arXiv:1705.10872},
+}
+
+@article{facenet,
+Author = {Florian Schroff and Dmitry Kalenichenko and James Philbin},
+Title = {FaceNet: A Unified Embedding for Face Recognition and Clustering},
+Year = {2015},
+Eprint = {arXiv:1503.03832},
+Doi = {10.1109/CVPR.2015.7298682},
+}
+
 @InProceedings{patches,
 author={Vassileios Balntas and Karel Lenc and Andrea Vedaldi and Krystian Mikolajczyk},
 title = {HPatches: A benchmark and evaluation of handcrafted and learned local descriptors},
diff --git a/report/build/cw2_vz215_np1915.pdf b/report/build/cw2_vz215_np1915.pdf
deleted file mode 100644
index f68d727..0000000
--- a/report/build/cw2_vz215_np1915.pdf
+++ /dev/null
diff --git a/report/makefile b/report/makefile
index b77aca0..ec6b6e8 100644
--- a/report/makefile
+++ b/report/makefile
@@ -12,7 +12,7 @@ FLAGS_PDF = --template=template.latex
 
 pdf:
 	mkdir -p $(OUTPUT)
-	pandoc -o $(OUTPUT)/cw2_vz215_np1915.pdf $(FLAGS) $(FLAGS_PDF) $(FILES)
+	pandoc -o $(OUTPUT)/cw_interim_vz215.pdf $(FLAGS) $(FLAGS_PDF) $(FILES)
 
 clean:
 	rm build/*
diff --git a/report/metadata.yaml b/report/metadata.yaml
index 8be0a21..9aaecea 100644
--- a/report/metadata.yaml
+++ b/report/metadata.yaml
@@ -7,6 +7,6 @@ numbersections: yes
 lang: en
 babel-lang: english
 abstract: |
-  This is my abstract
+  In this interim report we formulate the machine learning problem and outline the baseline model. We present training curves with improved parameters for training speed and evaluate the baseline. Improvement, such as non linear loss and deeper denoising network are suggested.
 ...
 
diff --git a/report/paper.md b/report/paper.md
index 87c22ab..2829952 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -88,20 +88,26 @@ Where $S$ is the number of sequences in the batch, $K$ is the number of images i
 
 Batch loss presents a series of problems when applied on the HPatches dataset. Implementations of batch triplet loss often use randomly sampled batches. For a dataset like MNIST which has only 10 classes, this is not a problem as there is it is very unlikely to have no valid triplets. In the HPatches dataset, image sequences tend to have over 1000 patches, meaning that the probability of having no valid triplets is very high. In these situations loss is meaningliss and hence a random batch sequence is unfeasable. The first test of batch loss failed due to this and required an alternatet solution.
 
-We therefore implemented  batches of size $SK$ formed with $S$ number patch sequences containg $K \geq 2$ patches. The anchor positive permutation's are therefor $(K-1)$ possible positives for each anchor, and $(S-1)K)$ negatives for each pair. With a guarranteed total number of $K^2(K-1)(S-1)$ triplets. This has the added benefit of allowing the positive and negative distances masks to be precomputed based on the $S$ and $K$ as the patches are ordered. It should be noted that the difficult of the batch is highly reliant both $SK$ and $K$. The larger $K$ the more likely it is to have a harder the furthest anchor-postive pair, and the bigger the batch size $SK$ the more likely it is to find a close negative.
+We therefore implemented  batches of size $SK$ formed with $S$ number patch sequences containg $K \geq 2$ patches. The anchor positive permutation's are therefor $(K-1)$ possible positives for each anchor, and $(S-1)K)$ negatives for each pair. With a guarranteed total number of $K^2(K-1)(S-1)$ triplets. This has the added benefit of allowing the positive and negative distances masks to be precomputed based on the $S$ and $K$ as the patches are ordered. It should be noted that the difficulty of the batch is highly reliant both $SK$ and $K$. The larger $K$ the more likely it is to have a harder the furthest anchor-postive pair, and the bigger the batch size $SK$ the more likely it is to find a close negative.
 
 ## Collapse
 
-Even after the implementation of symmetric batch formation, we were unable to train the descriptor model without having the loss becomming stuck at the margin, that is $\loss{BH} = \alpha$. Upon observation of the descriptors it was evident that the descriptor model producing descriptors with all dimensions approaching zero, regardless of the input. A naive solution is to apply $L2$ normalisation to the descriptors, but that does not avoid collapse as the network learns to output the same descriptor every time.
+Even after the implementation of symmetric batch formation, we were unable to train the descriptor model without having the loss becoming stuck at the margin, that is $\loss{BH} = \alpha$. Upon observation of the descriptors it was evident that the descriptor model producing descriptors with all dimensions approaching zero, regardless of the input. A naive solution is to apply $L2$ normalisation to the descriptors, but that does not avoid collapse as the network learns to output the same descriptor every time and this further unnecessarily limits the descriptors to the unit hypersphere.
 
-Avoiding collapse of all the descriptors to a single point proved to not be an easy task. Upon experimentation, we find that descriptors tend to collapse for large batch sizes. 
+Avoiding collapse of all the descriptors to a single point proved to be a hard task. Upon experimentation, we find that descriptors tend to collapse for large batch sizes with a large $K$. Initial experiments, which eagerly used large batch sizes in order to utilise TPU acceleration would ubiquitously collapse to the erroneous local minima of identical descriptors. Avoiding collapse is ultimately solved with low learning rate and small/easy batch. This significantly limits the advantages of batch hard loss, and makes it extremely slow and hard to train.
+
+We further find that square Euclidean distance: $D\left(\nnfn(x_i), \nnfn(x_j)\right) = \norm{\nnfn(x_i) - \nnfn(x_j)}_2^2$, while cheaper to compute, is much more prone to collapse (in fact we did not successfuly train a network with square Euclidean distance). A. Hermans et. al. make a similar observation about squared Euclidean distance on the MARS dataset [@defense].
 
 ## Progressive hard batch mining
 
+We find that while collapse is hard to avoid for large batch with $K > 3$, the risk of collapse disappears if we are able to reach a $\loss{BH} < \alpha$, and are hence able to use a more aggressive learning rate. We were able to train the model up to larger batches, by starting with a baseline trained model and progressively increasing the batch size, training the network while maintaing the loss below the margin to avoid collapse.
+
 
 ## Soft Margin Loss
 
-* Softplus function
+One of properties of triplet loss as implemented with the margin based hinge loss function $\left[m + \bullet\right]_+$ is that it avoids optimisation based on already submarginal "easy" triplet's which have their loss set to zero. However it is possible to replace the loss function such that it still produces a small amount of loss for these correct triplets, such that they can be pulled closer together. We use the `keras.backend.softplus` function which implements an exponential loss below zero instead of a hard cut off. A. Hermans et. al. [@defense] refer to this as the soft margin formulation and implement it as $\ln(1 + \exp(\bullet))$ which is identical to the `keras`'s softplus fuction. 
+
+It is important to note that the point of collapse when using soft margin formulation with $\alpha = 0$ is approximately $0.69$, as it per our training strategy is desirable to progressively increase batch size, such that it the loss stays beyong point of collapse.
 
 # Visualisation
 
diff --git a/report/template.latex b/report/template.latex
index d459163..da5bdd2 100644
--- a/report/template.latex
+++ b/report/template.latex
@@ -1,253 +1,57 @@
-\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$babel-lang$,$endif$$if(papersize)$$papersize$paper,$endif$$for(classoption)$$classoption$$sep$,$endfor$]{IEEEtran}
-$if(beamerarticle)$
-\usepackage{beamerarticle} % needs to be loaded first
-$endif$
-$if(fontfamily)$
-\usepackage[$for(fontfamilyoptions)$$fontfamilyoptions$$sep$,$endfor$]{$fontfamily$}
-$else$
-\usepackage{lmodern}
-$endif$
-\usepackage{float}
-$if(linestretch)$
-\usepackage{setspace}
-\setstretch{$linestretch$}
-$endif$
-\usepackage{amssymb,amsmath}
-\usepackage{ifxetex,ifluatex}
-\usepackage{fixltx2e} % provides \textsubscript
-\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
-  \usepackage[$if(fontenc)$$fontenc$$else$T1$endif$]{fontenc}
-  \usepackage[utf8]{inputenc}
-$if(euro)$
-  \usepackage{eurosym}
-$endif$
-\else % if luatex or xelatex
-  \ifxetex
-    \usepackage{mathspec}
-  \else
-    \usepackage{fontspec}
-  \fi
-  \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
-$for(fontfamilies)$
-  \newfontfamily{$fontfamilies.name$}[$fontfamilies.options$]{$fontfamilies.font$}
-$endfor$
-$if(euro)$
-  \newcommand{\euro}{€}
-$endif$
-$if(mainfont)$
-    \setmainfont[$for(mainfontoptions)$$mainfontoptions$$sep$,$endfor$]{$mainfont$}
-$endif$
-$if(sansfont)$
-    \setsansfont[$for(sansfontoptions)$$sansfontoptions$$sep$,$endfor$]{$sansfont$}
-$endif$
-$if(monofont)$
-    \setmonofont[Mapping=tex-ansi$if(monofontoptions)$,$for(monofontoptions)$$monofontoptions$$sep$,$endfor$$endif$]{$monofont$}
-$endif$
-$if(mathfont)$
-    \setmathfont(Digits,Latin,Greek)[$for(mathfontoptions)$$mathfontoptions$$sep$,$endfor$]{$mathfont$}
-$endif$
-$if(CJKmainfont)$
-    \usepackage{xeCJK}
-    \setCJKmainfont[$for(CJKoptions)$$CJKoptions$$sep$,$endfor$]{$CJKmainfont$}
-$endif$
-\fi
-% use upquote if available, for straight quotes in verbatim environments
-\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
-% use microtype if available
-\IfFileExists{microtype.sty}{%
-\usepackage{microtype}
-\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
-}{}
-$if(geometry)$
-\usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry}
-$endif$
-\usepackage[unicode=true]{hyperref}
-$if(colorlinks)$
-\PassOptionsToPackage{usenames,dvipsnames}{color} % color is loaded by hyperref
-$endif$
-\hypersetup{
-$if(title-meta)$
-            pdftitle={$title-meta$},
-$endif$
-$if(author-meta)$
-            pdfauthor={$author-meta$},
-$endif$
-$if(keywords)$
-            pdfkeywords={$for(keywords)$$keywords$$sep$, $endfor$},
-$endif$
-$if(colorlinks)$
-            colorlinks=true,
-            linkcolor=$if(linkcolor)$$linkcolor$$else$Maroon$endif$,
-            citecolor=$if(citecolor)$$citecolor$$else$Blue$endif$,
-            urlcolor=$if(urlcolor)$$urlcolor$$else$Blue$endif$,
-$else$
-            pdfborder={0 0 0},
-$endif$
-            breaklinks=true}
-\urlstyle{same}  % don't use monospace font for urls
-$if(lang)$
-\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
-  \usepackage[shorthands=off,$for(babel-otherlangs)$$babel-otherlangs$,$endfor$main=$babel-lang$]{babel}
-$if(babel-newcommands)$
-  $babel-newcommands$
-$endif$
-\else
-  \usepackage{polyglossia}
-  \setmainlanguage[$polyglossia-lang.options$]{$polyglossia-lang.name$}
-$for(polyglossia-otherlangs)$
-  \setotherlanguage[$polyglossia-otherlangs.options$]{$polyglossia-otherlangs.name$}
-$endfor$
-\fi
-$endif$
-$if(natbib)$
-\usepackage{natbib}
-\bibliographystyle{$if(biblio-style)$$biblio-style$$else$plainnat$endif$}
-$endif$
-$if(biblatex)$
-\usepackage[$if(biblio-style)$style=$biblio-style$,$endif$$for(biblatexoptions)$$biblatexoptions$$sep$,$endfor$]{biblatex}
-$for(bibliography)$
-\addbibresource{$bibliography$}
-$endfor$
-$endif$
-$if(listings)$
-\usepackage{listings}
-$endif$
-$if(lhs)$
-\lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small\ttfamily}}{}
-$endif$
-$if(highlighting-macros)$
-$highlighting-macros$
-$endif$
-$if(verbatim-in-note)$
-\usepackage{fancyvrb}
-\VerbatimFootnotes % allows verbatim text in footnotes
-$endif$
-$if(tables)$
-\usepackage{longtable,booktabs}
-% Fix footnotes in tables (requires footnote package)
-\IfFileExists{footnote.sty}{\usepackage{footnote}\makesavenoteenv{long table}}{}
-$endif$
-\usepackage{graphicx}
-$if(graphics)$
-\usepackage{graphicx,grffile}
-\makeatletter
-\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
-\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
-\makeatother
-% Scale images if necessary, so that they will not overflow the page
-% margins by default, and it is still possible to overwrite the defaults
-% using explicit options in \includegraphics[width, height, ...]{}
-\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
-$endif$
-$if(links-as-notes)$
-% Make links footnotes instead of hotlinks:
-\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
-$endif$
-$if(strikeout)$
-\usepackage[normalem]{ulem}
-% avoid problems with \sout in headers with hyperref:
-\pdfstringdefDisableCommands{\renewcommand{\sout}{}}
-$endif$
-$if(indent)$
-$else$
-\IfFileExists{parskip.sty}{%
-\usepackage{parskip}
-}{% else
-\setlength{\parindent}{0pt}
-\setlength{\parskip}{6pt plus 2pt minus 1pt}
-}
-$endif$
-\setlength{\emergencystretch}{3em}  % prevent overfull lines
-\providecommand{\tightlist}{%
-  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
-$if(numbersections)$
-\setcounter{secnumdepth}{$if(secnumdepth)$$secnumdepth$$else$5$endif$}
-$else$
-\setcounter{secnumdepth}{0}
-$endif$
-$if(subparagraph)$
-$else$
-% Redefines (sub)paragraphs to behave more like sections
-\ifx\paragraph\undefined\else
-\let\oldparagraph\paragraph
-\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
-\fi
-\ifx\subparagraph\undefined\else
-\let\oldsubparagraph\subparagraph
-\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
-\fi
-$endif$
-$if(dir)$
-\ifxetex
-  % load bidi as late as possible as it modifies e.g. graphicx
-  $if(latex-dir-rtl)$
-  \usepackage[RTLdocument]{bidi}
-  $else$
-  \usepackage{bidi}
-  $endif$
-\fi
-\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
-  \TeXXeTstate=1
-  \newcommand{\RL}[1]{\beginR #1\endR}
-  \newcommand{\LR}[1]{\beginL #1\endL}
-  \newenvironment{RTL}{\beginR}{\endR}
-  \newenvironment{LTR}{\beginL}{\endL}
-\fi
-$endif$
+\documentclass[10pt,twocolumn,letterpaper]{article}
 
-% set default figure placement to htbp
-\makeatletter
-\def\fps@figure{htbp}
-\makeatother
+\usepackage{cvpr}
+\usepackage{times}
+\usepackage{epsfig}
+\usepackage{graphicx}
+\usepackage{amsmath}
+\usepackage{amssymb}
 
-$for(header-includes)$
-$header-includes$
-$endfor$
+% Include other packages here, before hyperref.
 
-$if(title)$
-\title{$title$$if(thanks)$\thanks{$thanks$}$endif$}
-$endif$
-$if(subtitle)$
-\providecommand{\subtitle}[1]{}
-\subtitle{$subtitle$}
-$endif$
+% If you comment hyperref and then uncomment it, you should delete
+% egpaper.aux before re-running latex.  (Or just hit 'q' on the first latex
+% run, let it finish, and you should be clear).
+\usepackage[breaklinks=true,bookmarks=false]{hyperref}
 
-$if(author)$
-\author{
-    $for(author)$
-        \IEEEauthorblockN{$author.name$}
-        \IEEEauthorblockA{%
-            $author.affiliation$ \\
-            $author.email$ \\
-            $author.link$}
-        $sep$ \and
-    $endfor$
-}
-$endif$
+\cvprfinalcopy % *** Uncomment this line for the final submission
 
-$if(institute)$
-\providecommand{\institute}[1]{}
-\institute{$for(institute)$$institute$$sep$ \and $endfor$}
-$endif$
-\date{$date$}
+\def\cvprPaperID{****} % *** Enter the CVPR Paper ID here
+\def\httilde{\mbox{\tt\raisebox{-.5ex}{\symbol{126}}}}
 
+% Pages are numbered in submission mode, and unnumbered in camera-ready
+%\ifcvprfinal\pagestyle{empty}\fi
+\setcounter{page}{1}
 \begin{document}
-$if(title)$
+
+%%%%%%%%% TITLE
+\title{EE3-25 Deep Learning Report}
+
+\author{Vasil Zlatanov\\
+01120518\\
+{\tt\small vz215@ic.ac.uk}
+% For a paper whose authors are all at the same institution,
+% omit the following lines up until the closing ``}''.
+% Additional authors and addresses can be added with ``\and'',
+% just like the second author.
+% To save space, use either the email address or home page, not both
+% \and
+% Second Author\\
+% ICL login\\
+% CID\\
+% {\tt\small secondauthor@i2.org}
+}
+
 \maketitle
-$endif$
-$if(abstract)$
+%\thispagestyle{empty}
+
+%%%%%%%%% ABSTRACT
 \begin{abstract}
-$abstract$
+  Abstract - ...
 \end{abstract}
-$endif$
 
-$if(keywords)$
-\begin{IEEEkeywords}
-$for(keywords)$
-    $keywords$$sep$;
-$endfor$
-\end{IEEEkeywords}
-$endif$
+\providecommand{\tightlist}{%
+  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
 
 $for(include-before)$
 $include-before$
@@ -268,6 +72,11 @@ $endif$
 $if(lof)$
 \listoffigures
 $endif$
+
+\newcommand{\loss}[1]{\mathcal{L}_\textnormal{#1}}
+\newcommand{\nnfn}{f_\theta}
+\newcommand{\norm}[1]{\left\lVert#1\right\rVert}  % Thanks http://tex.stackexchange.com/a/107190
+
 $body$
 
 $if(natbib)$
author	Vasil Zlatanov <v@skozl.com>	2019-03-20 22:47:53 +0000
committer	Vasil Zlatanov <v@skozl.com>	2019-03-20 22:47:53 +0000
commit	a283b7fb373851d42113f2a44203ffeb8777f039 (patch)
tree	fdc9651bf8d8e7bb747a72a280d9ed431ef20b20
parent	d8f9a4863c5a2002495fd698cc1d9517254f28c4 (diff)
download	e3-deep-a283b7fb373851d42113f2a44203ffeb8777f039.tar.gz e3-deep-a283b7fb373851d42113f2a44203ffeb8777f039.tar.bz2 e3-deep-a283b7fb373851d42113f2a44203ffeb8777f039.zip