Reasoning for tranforming linear model to log-level

by Schoguan   Last Updated May 18, 2020 15:19 PM

I have a multiple linear regression model and found that my error terms are not normally distributed. When looking at the histogram of the dependent variable, it looks like below.

I am not sure how to proceed - what kind of transformation would reasonably be appropriate? I tried it with log-level model (take the log of the dependent variable) and found that all assumptions are fulfilled except for heteroskedasticity - for which I could create robust standard errors in the final model. However, I do not understand why taking the log would make any sense when looking at the distribution of data - as it is not skewed.

Other transformations that I tried (square or log independent variables that are skewed) did not solve the problem of non-normal distributed error terms.

How would you proceed and with what reasoning? Thanks!

Graph that shows two overlapping processes EDIT:

Also adding the graph of the error terms. Result of Shapiro-Wilk test for residuals was W = 0.99051, p-value = 0.07358.

Scatter plot of error terms

Answers 1

Instead of trying to make the data fit the model, I suggest getting a model that fits the data. Instead of OLS regression, you could try a method that does not make assumptions about the error term, such as quantile regression or robust regression or perhaps some sort of regression tree.

Peter Flom
Peter Flom
May 18, 2020 15:05 PM

Related Questions

Shapiro-Wilk normality test on paired data

Updated May 29, 2020 21:19 PM

Interpreting the log of a given rate in regression

Updated August 12, 2017 18:19 PM

Deciphering a log based multiple regression result

Updated August 12, 2017 00:19 AM