How to apply Box Cox to train and test data?

by Tarun Abichandani   Last Updated July 02, 2018 13:19 PM

I am trying to standardize my data to performing prediction on it.

Some of the features in my data are skewed and hence I am applying Box Cox transformation to reduce skewness.

My data also contains negative values as well as zeros and as Box Cox transformation does not work on negative values, I shift my data set to make all values positive.

Using : F[i] = F[i] + 1 - min(F) , where F is one of my feature

Please not that my train and test data sets are different, and both have different means.

I need to apply the same transformation to train as well as test data sets

How do I apply it to train and test data set ?

1) Should I apply Box Cox to train data set, capture the parameters, like, shifting constant (the constant used to shift train data set), lambda and use the same parameters for test data set ?


2) Should I apply Box Cox to train and test data set independently ? Not considering the train data set parameters while applying Box Cox on test data set ?

Answers 1

The data in the training and test sets should have the same meaning. If you standardize the data based in the mean and standard deviation or using a Box-Cox transformation, you should use in the testing set the means/SDs or lambda calculated in the training set.

Jesus Herranz Valera
Jesus Herranz Valera
July 19, 2016 10:20 AM

Related Questions

Normalize dataset considering future possible values

Updated November 22, 2018 10:19 AM