Comments on Econometrics Beat: Dave Giles' Blog: More Comments on the Use of the LPM

Paul - thanks for the comment. I'm not aware o...

2013-02-02T09:07:02.210-08:00

Paul - thanks for the comment.
I'm not aware of any published results on the "virtues" of the LPM when there are endogenous dummies or in the panel data context. (I'd love to hear of any.)
My comment related to the fact that we know that there can be issues with logit and probit in these cases. It would still take a lot to get me to use the LPM.

Why not set up a small Monte Carlo experiment that captures the basic features of your model/data and compare logit and LPM.

In his post on Mostly Harmless Econometrics, Mark ...

2013-02-02T08:56:28.827-08:00

In his post on Mostly Harmless Econometrics, Mark writes "Dave [Gibsons]’s conclusion is that one should use probit or logit unless there are really good reasons not to (e.g., endogenous dummies or with panel data)."

Searching through this blog I haven't been able to find anything on the virtues of the LPM with endogenous dummies or panel data. Am I missing something?

I'm curious because I am fitting fixed-effects models to dichotomous data, and my choice of model (LPM vs. logit) makes a big difference in the results.

Thanks, Dave, I look forward to seeing the results...

2012-07-16T13:24:37.556-07:00

Thanks, Dave, I look forward to seeing the results. BTW, I'm the same Mark that posed the question at the MHE blog.

Cheers,
Mark

Mark: Thanks for the thoughtful and clear comment....

2012-07-15T09:48:28.391-07:00

Mark: Thanks for the thoughtful and clear comment.

On re-reading HASM, I think you are in fact right when you say that the drawback of the LPM only shows up if you are estimating the HASM misclassification model.

I'm going to do some work of my own on this aspect of the LPM/Logit/Probit debate!

As you know, I still have other issues with the LPM!

Dave, I agree with Alfredo - this exchange is ill...

2012-07-14T14:56:03.597-07:00

Dave,

I agree with Alfredo - this exchange is illuminating as well as interesting!

I want to follow up on the issue of non-identifiability that you raised in your earlier blog entry (Hausman-Abrevaya-Scott-Morton 1998, J. Econometrics). I'm not (yet?) convinced this is as serious as you suggest.

If I understand HASM correctly, misclassification means that the standard probit and logit will generate inconsistent estimates. The advantage of probit and logit shows up only if the researcher wants to estimate parametrically the HASM misclassification model. HASM point out that this isn't possible with the LPM because the parameters of the misclassification model won't be separately identifiable.

Is this such a big problem? You're encouraging people to use probits or logits instead of the LPM because of the misclassification issue. If misclassification is present and people go ahead and estimate a probit or logit anyway, they'll get inconsistent estimates. No obvious advantage over the LPM on these grounds. If people want to estimate a misclassification model they can't use the LPM because the parameters can't be identified. But this drawback of the LPM only shows up if the researcher is estimating the HASM misclassification model.

So for researchers who are estimating a simple binary choice model, I don't see how identification in the LPM case is an issue. If misclassification is not present, non-identification is not an issue with the LPM; if misclassification is present, probit, logit and the LPM will all be inconsistent.

Even if misclassification is present, I think perhaps you're overstating the problem. You put it like this:

"So, ask yourself the following question:"

"When I have binary choice data, can I be absolutely sure that every one of the observations has been classified correctly into zeroes and ones?"

"If your answer is "No", then forget about using the LPM. You'll just be trying to do the impossible - namely, estimate parameters that aren't identified."

HASM point out that their model is indistinguishable from a model with heterogeneous responses: a fraction always answers "yes", a fraction always answers "no", and the rest follow the standard binary choice model. Take a deliberately silly example: say I have sample of 100k persons, one of whom always says "yes". The LPM estimates will be very slightly contaminated by ignoring the heterogeneity from this single person. Does this really mean I should therefore "forget about using the LPM"? That would seem like an overreaction!

Am I missing something here?

--Mark