I have a multiple linear regression model and found that my error terms are not normally distributed. When looking at the histogram of the dependent variable, it looks like below.
I am not sure how to proceed - what kind of transformation would reasonably be appropriate? I tried it with log-level model (take the log of the dependent variable) and found that all assumptions are fulfilled except for heteroskedasticity - for which I could create robust standard errors in the final model. However, I do not understand why taking the log would make any sense when looking at the distribution of data - as it is not skewed.
Other transformations that I tried (square or log independent variables that are skewed) did not solve the problem of non-normal distributed error terms.
How would you proceed and with what reasoning? Thanks!
Also adding the graph of the error terms. Result of Shapiro-Wilk test for residuals was W = 0.99051, p-value = 0.07358.
Instead of trying to make the data fit the model, I suggest getting a model that fits the data. Instead of OLS regression, you could try a method that does not make assumptions about the error term, such as quantile regression or robust regression or perhaps some sort of regression tree.