# K-means codebook We randomly select 100k descriptors for K-means clustering for building the visual vocabulary (due to memory issue). Open the main_guideline.m and select/load the dataset. ``` [data_train, data_test] = getData('Caltech'); ``` Set 'showImg = 0' in getData.m if you want to stop displaying training and testing images. Complete getData.m by writing your own lines of code to obtain the visual vocabulary and the bag-of-words histograms for both training and testing data. Show, measure and discuss the followings: ## Vocabulary size ## Bag-of-words histograms of example training/testing images ## Vector quantisation process # RF classifier Train and test Random Forest using the training and testing data set in the form of bag-of-words obtained in Q1. Change the RF parameters (including the number of trees, the depth of trees, the degree of randomness parameter, the type of weak-learners: e.g. axis-aligned or two-pixel test), and show and discuss the results: ## recognition accuracy, confusion matrix, ## example success/failures, ## time-efficiency of training/testing, ## impact of the vocabulary size on classification accuracy. # RF codebook In Q1, replace the K-means with the random forest codebook, i.e. applying RF to 128 dimensional descriptor vectors with their image category labels, and using the RF leaves as the visual vocabulary. With the bag-of-words representations of images obtained by the RF codebook, train and test Random Forest classifier similar to Q2. Try different parameters of the RF codebook and RF classifier, and show/discuss the results in comparison with the results of Q2, including the vector quantisation complexity. # References