by Schoguan
Last Updated May 18, 2020 15:19 PM

I have a multiple linear regression model and found that my error terms are not normally distributed. When looking at the histogram of the dependent variable, it looks like below.

I am not sure how to proceed - what kind of transformation would reasonably be appropriate? I tried it with log-level model (take the log of the dependent variable) and found that all assumptions are fulfilled except for heteroskedasticity - for which I could create robust standard errors in the final model. However, I do not understand why taking the log would make any sense when looking at the distribution of data - as it is not skewed.

Other transformations that I tried (square or log independent variables that are skewed) did not solve the problem of non-normal distributed error terms.

How would you proceed and with what reasoning? Thanks!

Also adding the graph of the error terms. Result of Shapiro-Wilk test for residuals was W = 0.99051, p-value = 0.07358.

Instead of trying to make the data fit the model, I suggest getting a model that fits the data. Instead of OLS regression, you could try a method that does not make assumptions about the error term, such as quantile regression or robust regression or perhaps some sort of regression tree.

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger