basically done

author: Vasil Zlatanov <v@skozl.com> 2019-03-21 23:23:09 +0000
committer: Vasil Zlatanov <v@skozl.com> 2019-03-21 23:23:09 +0000
commit: f38dcc474ce7b97f37b5a226ad66043199402b0f (patch)
tree: 035bcec8c7d8c0b73aa51422c1ae4acf5b661b9a
parent: a6e407b86854cc336684723d251746d0fd1f28e8 (diff)
download: e3-deep-f38dcc474ce7b97f37b5a226ad66043199402b0f.tar.gz
e3-deep-f38dcc474ce7b97f37b5a226ad66043199402b0f.tar.bz2
e3-deep-f38dcc474ce7b97f37b5a226ad66043199402b0f.zip
2 files changed, 24 insertions, 15 deletions
diff --git a/report/bibliography.bib b/report/bibliography.bib
index ddcf6c4..dcb2b2d 100644
--- a/report/bibliography.bib
+++ b/report/bibliography.bib
@@ -33,3 +33,10 @@ booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVP
 month = {July},
 year = {2017}
 }
+
+@misc{unet,
+Author = {Olaf Ronneberger and Philipp Fischer and Thomas Brox},
+Title = {U-Net: Convolutional Networks for Biomedical Image Segmentation},
+Year = {2015},
+Eprint = {arXiv:1505.04597},
+}
diff --git a/report/paper.md b/report/paper.md
index 3208e88..fffecc2 100644
--- a/report/paper.md
+++ b/report/paper.md
@@ -2,7 +2,7 @@
 
 This coursework's goal is to develop an image representation of patches from the `HPatches` dataset for the purpose of matching and retrieval, where images with the same features (and classes) are nearby in the reduced Euclidean space of the descriptors, while dissimilar ones are far apart. The dataset contains patches sampled from image sequences taken from the same scenes. For each image sequence there is a reference image, and two more files `eX.png` and `hX.png` containing corresponding patches from the images in the sequence with altered illumination (`i_X`) or viewpoint (`v_X`). Furthemore corresponding patches are extracted with geometric noise, easy `e_X` have a small amount of jitter while `h_X` patches have a larger amount [@patches]. The patches as processed by our networks are monochrome 32 by 32 images. An series eights images from the same sequence are shown on figure \ref{sequence}.
 
-![Sequence from the HPatches dataset\label{sequence}](fig/sequence.png){width=20em height=15em}
+![Sequence from the HPatches dataset\label{sequence}](fig/sequence.png){width=20em height=12em}
 
 The goal is to train a network, which given a patch is able to produce a descriptor vector with a dimension of 128. The descriptors are evaluated based on there performance across three tasks:
 
@@ -35,14 +35,14 @@ Batch Size & CPU & GPU & TPU &  \\ \hline
 
 ### U-Net Denoising
 
-A shallow version of the U-Net network is used to denoise the noisy patches. The shallow U-Net network has the same output size as the input size, is fed a noisy image and has loss computed as the mean average Euclidean distance (L1 loss) with a clean reference patch. This effectively teaches the U-Net autoencoder to perform a denoising operation on the input images.
+A shallow version of the U-Net  network [@unet] is used to denoise the noisy patches. The shallow U-Net network has the same output size as the input size, is fed a noisy image and has loss computed as the mean average Euclidean distance (L1 loss) with a clean reference patch. This effectively teaches the U-Net autoencoder to perform a denoising operation on the input images.
 Efficient training can performed with TPU acceleration, a batch size of 4096 and the Adam optimizer with learning rate of 0.001 and is shown on figure \ref{fig:denoise}. Training and validation was performed with all available data.
 The network is able to achieve a mean average error of 5.3 after 19 epochs. With gradient descent we observed a loss of 5.5 after the same number of epochs. There is no observed evidence of overfitting with the shallow net, something which may be expected with a such a shallow network. An example of denoising as performed by the network is visible in figure \ref{fig:den3}.
 Quick experimentation with a deeper version of U-Net shows it is possible to achieve validation loss of below 5.0 after training for 10 epochs, and a equivalent to the shallow loss of 5.3 is achievable aftery only 3 epochs.
 
 ![Denoise example - 20th epoch\label{fig:den3}](fig/denoised.png){width=20em height=15em}
 
-**Talk about max performance**
+Visualisation of denoising as seen in figure \ref{fig:den3} demonstrates the impressive performance of the denoising model. We are able use a deeper U-Net containing 3 downsampling and 3 upsampling layers, as per the architecture of Olaf Ronneberger et.al [@unet] to achieve 4.5 mean average error loss. We find that at this loss the network converges and we therefore focus our attention to the descriptor model.
 
 ### L2 Net
 
@@ -50,7 +50,7 @@ The network used to output the 128 dimension descriptors is a L2-network with tr
 
 ### Triplet Loss
 
-The loss used for Siamese L2-Net as implemented in the baseline can be formulate as:
+The loss used for Siamese L2-Net as implemented in the baseline can be formulated as:
 
 \begin{equation}\label{eq:loss_trip}
   \loss{tri}(\theta) = \frac{1}{N} \sum\limits_{\substack{a,p,n \\ y_a = y_p \neq y_n}} \left[ D_{a,p} - D_{a,n} + \alpha \right]_+.
@@ -102,11 +102,11 @@ We find that for large batch sizes with $K > 3$ avoiding collapse when training
 
 While doing so, we observed that the risk of collapse disappears if we are able to reach a $\loss{BH} < \alpha$. After the loss has dipped under the margin we find that we may increase the learning rate as the collapsed state has a higher loss and as such the optimizer has no incentive to push the network into collapse. 
 
-Eventual training with batch size of of 2048 with $K=16$ (the maximum K for HPatches as not all sequences have more than 16 patches) was achieved by progressively increasing the batch size, starting with weights from the baseline model. We take care to increase the batch size only so much, such that the loss is consitently below $\alpha$. This training technique allows the use of the Adam optimizer and higher learning rates and can be done automatically. We call this method *stepped batch hard*, as to our knowledge this techinque has not been described in literature previously.
+Eventual training with batch size of of 1028 with $K=8$ (the maximum K for HPatches is 16 as not all sequences have more than 16 patches) was achieved by progressively increasing the batch size, starting with weights from the baseline model. We take care to increase the batch size only so much, such that the loss is consitently below $\alpha$. This training technique allows the use of the Adam optimizer and higher learning rates and can be done automatically. We call this method *stepped batch hard*, as to our knowledge this techinque has not been described in literature previously.
 
-Stepping was performed with batch sizes of 32, 64, 128, 256, 512, 1024 and finally 2048. With $K$ starting at 2 and eventually reaching 16. We performed training with the Adam optimizer with a learning rate of $2 \times 10^{-5}$.
+Stepping was performed with batch sizes of 32, 64, 128, 256, 512 and 1024. With $K$ starting at 2 and eventually reaching 16. We performed training with the Adam optimizer with a learning rate of $2 \times 10^{-5}$. The learning curve can be seen in figure \ref{stepped} with the jumps in batch size.
 
-->> Graph of stepped batch hard
+![Stepped Batch Training \label{stepped}](fig/augmentation.png){width=20em height=12em}
 
 ## Soft Margin Loss
 
@@ -116,22 +116,22 @@ It is important to note that the point of collapse when using soft margin formul
 
 ## Feature Augmentation
 
-We implement feature augmentation through random flips and rotations using numpy's `np.flip` and `np.rot90` functions. Nevertheless, we make a case that feature augmentation is detrimental for the HPatches dataset as patch sequences. An example sequence on figure \ref{augmenent} shows patches carry similar positional appearance across the same sequence, which is nullified by random flips and rotations. Experimentally we observe a near doubling of the loss, which reduces maximum batch hard size which fits below the collapse threshold and ultimately results in lower performance.
+We implement feature augmentation through random flips and rotations using numpy's `np.flip` and `np.rot90` functions. Nevertheless, we make a case that feature augmentation is detrimental for the HPatches dataset as patch sequences. An example sequence on figure \ref{augment} shows patches carry similar positional appearance across the same sequence, which is nullified by random flips and rotations. Experimentally we observe a near doubling of the loss, which reduces maximum batch hard size which fits below the collapse threshold and ultimately results in lower performance.
 
-![Non-Augmented Sequence\label{augment}](fig/augmentation.png){width=20em height=15em}
+![Non-Augmented Sequence\label{augment}](fig/augmentation.png){width=20em height=12em}
 
 # Experimental Results
 
 ** Give more specific learning rates, and results ! ***
 
-\begin{table}[]
+\begin{table}[h!]
 \begin{center}
 \begin{tabular}{lrrr} \hline
 Training Method      & Verification & Matching & Retrieval \\ \hline
-Baseline             & 0.813        & 0.317    & 0.544     \\
-Soft Baseline        & 0.853        & 0.324    & 0.558     \\
-Batch Hard 128       & 0.873        & 0.356    & 0.645     \\
-Batch Hard  1024     & 0.873        & 0.356    & 0.645     \\
+Baseline             & 0.821        & 0.249    & 0.544     \\
+Soft Baseline        & 0.853        & 0.261    & 0.558     \\
+Batch Hard 128       & 0.857        & 0.312    & 0.618     \\
+Batch Hard  1024     & 0.863        & 0.342    & 0.625     \\
 Soft Batch Hard 1024 & 0.873        & 0.356    & 0.645     \\ \hline
 \end{tabular}
 \label{results}
@@ -141,7 +141,9 @@ Soft Batch Hard 1024 & 0.873        & 0.356    & 0.645     \\ \hline
 
 # Visualisation
 
-![2D Descriptor Visualisation with t-SNE (S=128;K=16)](fig/tsne.pdf){width=20em height=15em}
+We may leverage visualisation of the descriptors feature to identify if descriptors sequences are seperated as we expect them to be. Figure \ref{tsne} visualises the descriptor embeddings with t-SNE in 2 dimensions for as single batch with size 2048 containing 128 sequences with 16 patches each as trained with batch hard. For a collapsed network all descriptors appear clustered at the same point, while for a well training network we see them seperate appart.
+
+![2D Descriptor Visualisation with t-SNE (S=128;K=16)\label{tsne}](fig/tsne.pdf){width=20em height=15em}
 
 # Appendix
author	Vasil Zlatanov <v@skozl.com>	2019-03-21 23:23:09 +0000
committer	Vasil Zlatanov <v@skozl.com>	2019-03-21 23:23:09 +0000
commit	f38dcc474ce7b97f37b5a226ad66043199402b0f (patch)
tree	035bcec8c7d8c0b73aa51422c1ae4acf5b661b9a
parent	a6e407b86854cc336684723d251746d0fd1f28e8 (diff)
download	e3-deep-f38dcc474ce7b97f37b5a226ad66043199402b0f.tar.gz e3-deep-f38dcc474ce7b97f37b5a226ad66043199402b0f.tar.bz2 e3-deep-f38dcc474ce7b97f37b5a226ad66043199402b0f.zip