I have a question about how to correct for selection bias in an experience. In this experience, we evaluate the impact of slight tweaks in phrasing on web surveys participation. For example, if you tell people that most people participate, are they more inclined to participate themselves? This is at the end of a phone survey, where the interviewer asks the interviewee whether he wants to participate to web surveys in the future. We test several phrasings that the interviewer will then read out loud to the interviewee.
I have two dependent variables that I evaluated: the first one is whether the person says yes or no. The second one is whether the person actually participates to the survey, which will take place 3 or 4 days later. What I did at first is simply two logit: I first evaluate the probability of saying “yes” based on the treatment, and a few other sociodemographic variables (age, gender, income, the usual). Then I evaluate the probability of accepting, using the exact same explanatory variables.
But I’ve been told by my thesis director (this research is part of my PhD in economics) that this was wrong because it failed to account for selection bias. The idea was that, since people have to say “yes” to the interviewer before they can participate (otherwise they won’t be on the mailing list), the second model (participation) will heavily be influenced by what make people say “yes”, and not what makes them actually participate.
In that, it’s a bit similar to the famous Heckman example, where he tries to evaluate what determines wages but he only has data on people who work, who may be significantly different from people who don’t. So naturally, my director advised me to look into the Heckman two-step model. Except there are two problems: it seems to only really work if the second model has a continuous dependent variable, where mine is binary (they either participate or they don’t). And, perhaps more importantly, I assume that it is the exact same variables that influence both participation and saying “yes”. This is really a core part of the research hypothesis and I can’t really change it. But I’ve seen that for the Heckman correction to actually mean something, the two models have to be different from one another.
In the end, I really don’t know what to do. I’m quite bad at econometrics and statistics, and I feel a bit out of my depth here, so if someone could take the time to tell me what they think of this situation, I would be very grateful.
Do I really need to test for a selection bias, or is it acceptable to ignore it in my situation? And if I have to test for one, what method could I use given that both my models have a binary dependent variable and exactly the same explanatory variables?
Thank you for your time!