Bag of Visual Words: is feature extraction even needed?

by Moran Reznik   Last Updated September 21, 2018 14:19 PM

I'm currently implementing a BoVW as part of my lab project. The steps the algorithm used are as follows:

  1. spliting all photos into patches
  2. cluster these pathces using K-means based on pixel values of each patch
  3. describe each photo as an histogram of how much of each cluster its patches belongs to
  4. use SVM to classify the data

now, my PI wants me to implements one more step: feature extraction. He suggested PCA, tSNE, auto-encoder and so on.In a lot of implementions I checked out for BoVW there is such a thing, usually auto-encoder, so it's a reasonable request.

The thing is: why is this step even needed? Isnt just by describing each photo as an histogram we applied feature extraction of some sort ? Why is adding this step we add no value, but just overcomplicate things?

Thank you all for any insights to this.

Related Questions

How to select X and Y variables simultaneously

Updated March 07, 2017 14:19 PM

Classifier with variable number of features

Updated April 11, 2015 21:08 PM

Predicting based on regressor measured over time

Updated February 01, 2019 14:19 PM