# Is it valid to look at the impact of a feature on residuals?

by roundsquare   Last Updated September 19, 2019 21:19 PM

Background

I'm trying to measure the causal impact of action on outcome (sorry for the vague names, but trying to keep this general). My data consists of the following for each record:

• m other_features [the result of TruncatedSVD to a much larger number of features]
• action which is a binary
• outcome which is continuous [in all cases here, I use the log of the actual value]

Based on business knowledge of the situation, I'm confident in a causal model that states:

• the other_features have a causal effect on action
• the other_features also have a causal effect on outcome
• the action have a causal effect on outcome

I tried two techniques and got different results - I wanted to know how to interpret these results.

Attempt 1: Vanilla Linear Regression

First, I tried to regress outcome against other_features and action. When I did this, I got the following results (I'm using $$\beta$$ to refer to linear coefficients):

• $$R^2 = 0.793$$
• $$\beta_{action} = 0.0943$$

This implies that the action has the effect of ~9.5% increase on outcome.

Attempt 2: Looking at Action vs Residuals

Second, I tried to regress outcome against other_features (i.e. without action). When I did this, I got the following results:

• $$R^2 = 0.793$$ [very little difference with the original regression).

Then, I looked at the residuals of those with and without action and got the following:

This seems to indicate that action has about a ~-8.3% impact on outcome.

Questions

1. Is attempt 2 a valid way to proceed?
2. If yes, how should I interpret the difference between the two approaches?
3. Is there anything I ought to be weary/careful/aware of in using this approach?

(Please let me know if it additional data/results would be helpful in interpreting these results, I can supplement the information here).

Tags :