Alfredo drew my attention to Steve Pische's reply to a question raised by Mark Schaffer in the Mostly Harmless Econometrics blog. The post was titled, Probit Better than LPM? The question related to my own posts (here, here, and here, in reverse order) on this blog concerning the choice between OLS (the Linear Probability Model - LPM) or the Logit/Probit models for binary data.
Thanks, Alfredo, as this isn't a blog I follow.
Alfredo asked: "Would you care to respond? I feel like this is truly an exchange from which a lot of people can learn".
Well, I do have a few follow-up comments, and here they are.
First, in the most recent of my posts I pointed out that if we make any mistakes at all in classifying the zeroes and ones for our binary dependent variable, the the LPM has a fatal flaw. We can't identify the parameters, and hence the marginal effects. That's really serious!
On the other hand, in the face of this problem the parameters and marginal effects are still identifiable in the Logit and Probit models.
Given the likelihood of some element of measurement error, I think this is a strike against the LPM.
Steve acknowledges that it would be nice to more work done on this type of measurement error problem, and I agree totally.
Second, Steve says, "we care about marginal effects" (rather than the underlying structural parameters.) I agree entirely, which is precisely why my post here focused on marginal effects.
He also says, "Obviously, the LPM won’t give the true marginal effects from the right nonlinear model. But then, the same is true for the “wrong” nonlinear model! The fact that we have a Probit, a Logit, and the LPM is just a statement to the fact that we don’t know what the “right” model is."
I agree with the first two sentences, but I'd add the following comment. The LPM can't give consistent estimates of the true marginal effects, period, (by Slutsky's Theorem), because it can't give consistent estimates of the underlying parameters. Why not? Because the true data-generating process can't be a linear regression model with normal errors.
So even if you want to reject the use of Logit or Probit on the grounds that they involve "arbitrary" nonlinear specifications, how can you defend using a different specification (the LPM) that is clearly incorrect?
Well, I do have a few follow-up comments, and here they are.
First, in the most recent of my posts I pointed out that if we make any mistakes at all in classifying the zeroes and ones for our binary dependent variable, the the LPM has a fatal flaw. We can't identify the parameters, and hence the marginal effects. That's really serious!
On the other hand, in the face of this problem the parameters and marginal effects are still identifiable in the Logit and Probit models.
Given the likelihood of some element of measurement error, I think this is a strike against the LPM.
Steve acknowledges that it would be nice to more work done on this type of measurement error problem, and I agree totally.
Second, Steve says, "we care about marginal effects" (rather than the underlying structural parameters.) I agree entirely, which is precisely why my post here focused on marginal effects.
He also says, "Obviously, the LPM won’t give the true marginal effects from the right nonlinear model. But then, the same is true for the “wrong” nonlinear model! The fact that we have a Probit, a Logit, and the LPM is just a statement to the fact that we don’t know what the “right” model is."
I agree with the first two sentences, but I'd add the following comment. The LPM can't give consistent estimates of the true marginal effects, period, (by Slutsky's Theorem), because it can't give consistent estimates of the underlying parameters. Why not? Because the true data-generating process can't be a linear regression model with normal errors.
So even if you want to reject the use of Logit or Probit on the grounds that they involve "arbitrary" nonlinear specifications, how can you defend using a different specification (the LPM) that is clearly incorrect?
© 2012, David E. Giles
Dave,
ReplyDeleteI agree with Alfredo - this exchange is illuminating as well as interesting!
I want to follow up on the issue of non-identifiability that you raised in your earlier blog entry (Hausman-Abrevaya-Scott-Morton 1998, J. Econometrics). I'm not (yet?) convinced this is as serious as you suggest.
If I understand HASM correctly, misclassification means that the standard probit and logit will generate inconsistent estimates. The advantage of probit and logit shows up only if the researcher wants to estimate parametrically the HASM misclassification model. HASM point out that this isn't possible with the LPM because the parameters of the misclassification model won't be separately identifiable.
Is this such a big problem? You're encouraging people to use probits or logits instead of the LPM because of the misclassification issue. If misclassification is present and people go ahead and estimate a probit or logit anyway, they'll get inconsistent estimates. No obvious advantage over the LPM on these grounds. If people want to estimate a misclassification model they can't use the LPM because the parameters can't be identified. But this drawback of the LPM only shows up if the researcher is estimating the HASM misclassification model.
So for researchers who are estimating a simple binary choice model, I don't see how identification in the LPM case is an issue. If misclassification is not present, non-identification is not an issue with the LPM; if misclassification is present, probit, logit and the LPM will all be inconsistent.
Even if misclassification is present, I think perhaps you're overstating the problem. You put it like this:
"So, ask yourself the following question:"
"When I have binary choice data, can I be absolutely sure that every one of the observations has been classified correctly into zeroes and ones?"
"If your answer is "No", then forget about using the LPM. You'll just be trying to do the impossible - namely, estimate parameters that aren't identified."
HASM point out that their model is indistinguishable from a model with heterogeneous responses: a fraction always answers "yes", a fraction always answers "no", and the rest follow the standard binary choice model. Take a deliberately silly example: say I have sample of 100k persons, one of whom always says "yes". The LPM estimates will be very slightly contaminated by ignoring the heterogeneity from this single person. Does this really mean I should therefore "forget about using the LPM"? That would seem like an overreaction!
Am I missing something here?
--Mark
Mark: Thanks for the thoughtful and clear comment.
DeleteOn re-reading HASM, I think you are in fact right when you say that the drawback of the LPM only shows up if you are estimating the HASM misclassification model.
I'm going to do some work of my own on this aspect of the LPM/Logit/Probit debate!
As you know, I still have other issues with the LPM!
Thanks, Dave, I look forward to seeing the results. BTW, I'm the same Mark that posed the question at the MHE blog.
ReplyDeleteCheers,
Mark
In his post on Mostly Harmless Econometrics, Mark writes "Dave [Gibsons]’s conclusion is that one should use probit or logit unless there are really good reasons not to (e.g., endogenous dummies or with panel data)."
ReplyDeleteSearching through this blog I haven't been able to find anything on the virtues of the LPM with endogenous dummies or panel data. Am I missing something?
I'm curious because I am fitting fixed-effects models to dichotomous data, and my choice of model (LPM vs. logit) makes a big difference in the results.
Paul - thanks for the comment.
DeleteI'm not aware of any published results on the "virtues" of the LPM when there are endogenous dummies or in the panel data context. (I'd love to hear of any.)
My comment related to the fact that we know that there can be issues with logit and probit in these cases. It would still take a lot to get me to use the LPM.
Why not set up a small Monte Carlo experiment that captures the basic features of your model/data and compare logit and LPM.