## Friday, June 1, 2012

### Another Gripe About the Linear Probability Model

So you're still thinking of using a Linear Probability Model (LPM) - also known in the business as good old OLS - to estimate a binary dependent variable model?

Well, I'm stunned!

Yes, yes, I've heard all of the "justifications" (excuses) for using the LPM, as opposed to using a Logit or Probit model. Here are a few of them:

• It's computationally simpler.
• It's easier to interpret the "marginal effects".
• It avoids the risk of mis-specification of the "link function".
• There are complications with Logit or Probit if you have endogenous dummy regressors.
• The estimated marginal effects from the LPM, Logit and Probit models are usually very similar, especially if you have a large sample size.

Oh really? That don't impress me much!

Why not? Well, in almost all circumstances, the LPM yields biased and inconsistent estimates. You didn't know that? Then take a look at the paper by Horrace and Oaxaca (2006), and some previous results given by Amemiya (1977)!

It's not the bias that worries me in this particular context - it's the inconsistency. After all, the MLE's for the Logit and Probit models are also biased in finite samples - but they're consistent. Given the sample sizes that we usually work with when modelling binary data, it's consistency and asymptotic efficiency that are of primary importance.

Why would you choose to use a modelling/estimation strategy that will give you the wrong answer, with probability one, even if you have an infinitely large sample? Wouldn't you rather use one that will give you the right answer, with certainty, if the sample size is very, very large?

If you're not feeling up to reading the Horrace and Oaxaca paper - bad news is always unpleasant - here's their key result. Let xi denote the ith observation on the vector of covariates, and let β be the vector of associated coefficients. So, xiβ is the true value of the ith element of the "prediction vector".

OLS estimation of the Linear Probability Model will be both biased and inconsistent, unless it happens to be the case that 0 ≤ xiβ ≤ 1, for every i.

How likely is that? In addition, notice that  xiβ isn't observed, in practice.
The bottom line, in their words (H & O, p.326):
"Although it is theoretically possible for OLS on the LPM to yield unbiased estimation, this generally would require fortuitous circumstances. Furthermore, consistency seems to be an exceedingly rare occurrence as one would have to accept extraordinary restrictions on the joint distribution of the regressors. Therefore, OLS is frequently a biased estimator and almost always an inconsistent estimator of the LPM."

Perhaps a small Monte Carlo simulation experiment will be helpful?

Here's what I've done. I generated binary data for my dependent variable, y, as follows:

yi* = β1 + β2xi + εi     ;        εi ~ iid N[0 , 1]

yi = 1  ;  if yi* > 0
= 0  ;  if yi* ≤ 0       ;   i = 1, 2, ...., n

Then, I estimated both a Probit model and a LPM, using the y and x data. In each case I computed the estimated marginal effect for x. (In the case of the LPM, this is just the OLS estimate of the coefficient of x.) Of course, as I know the true values of β1 and β2 in the data-generating process, I also know the true value of the marginal effect.

I've focused on marginal effects because they are more interesting than the parameters themselves in a Logit or Probit model. In addition, unlike the parameter estimates, they can be compared meaningfully between the LPM and Logit or Probit models.

For a fixed sample size, n, I replicated this 5,000 times. Various sample sizes were considered.

The EViews workfile and program file are on the code page for this blog. (I'm sorry that I didn't do this one in R! However, the EViews program file is also there as a text file for ease of viewing.)

Here are some summary results (Probit followed by LPM), first for n = 100

Comparing the means of the sampling distributions for the estimated marginal effects,  with the true value of 0.3526, you can see that the LPM (OLS) result is biased downwards, whereas the Probit result has virtually no bias.

Remember, this is for a sample size of n = 100. Now let's increase the sample size to n = 250:

Notice that the mean of the sampling distribution for the Probit-estimated marginal effect is very close to the true value of 0.2963. In contrast, the OLS/LPM estimator of the marginal effect is biased upwards (being 0.324, on average).

Now, let's explore the asymptotics a little by considering n = 5,000, and n = 10,000:

We can see that the marginal effect implied by OLS estimation of the LPM is converging to a value of approximately 0.32, as n increases. The standard deviation of the sampling distribution for that estimator is getting smaller and smaller. The inconsistency of the estimator is being driven by asymptotic bias.

On the other hand, the marginal effect implied by maximum likelihood estimation of the Probit model is converging to the true marginal effect  - even though the latter value is changing as n increases. In fact, this convergence is quite rapid. The (small) bias vanishes, and the standard deviation of the estimator continues to decrease, as the sample size grows.

So much for the last of the "reasons" for using the LPM in the bulleted list near the start of this post!

When we look at the proportion of sample values for which the true value of (β1 + β2xi) lies in the unit interval, the results for n = 100, 250, 5000, and 10000 are 20%, 15.2%, 19.2%, and 19.1% respectively. These values are a long way from 100%.

So, recalling Horace and Oaxaca's results, that's why OLS estimation of the Linear Probability Model is both biased and inconsistent, here. The maximum likelihood estimator is consistent, and its bias is quite small in finite samples, in this experiment.

Yes, there are some situations where I'd consider using an LPM. For example, with a complex panel data-set, or when there are endogenous dummy covariates. However, in the typical case of cross-section data, and no other complications, you'll have to work pretty hard to convince me!

References

Amemiya, T., 1977. Some theorems in the linear probability model. International Economic Review, 18, 645–650.

Horrace, W. C. & R. L. Oaxaca, 2006. Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics Letters, 90, 321-327. (WP version here.)

1. Dave: Can you suggest a reference for the endogenous dummy covariate case?

1. Brian: See J. A. Angrist, "Estimation of Limited Dependent Variable Models With Dummy Endogenous Regressors: Simple Strategies for Empirical Practice", Journal of Business & Economic Statistics, 2001, 19, 2-28 (includes discussion & response). email me directly if you have trouble getting this.

2. Would you still do this if heteroscedasticity was an issue?

1. Dimitriy: We'd need to model the het. in the Logit or Probit model, because we know that the MLE for the PARAMETERS in these models is inconsistent if there is het. I think you've already seen my earlier post on this point. I'd still avoid the LPM!

3. This post has some very useful information about LPM. Thanks for the effort you put into this and your other interesting (and often entertaining!) blog posts.

I try to do my bit by being as DEMONSTRATIVE as possible in telling my Econ 345 students why they must use probit or logit instead of LPM.

5. Well, Dave, you mention that panel data you would consider the use of LPM, but not cross-section... but there are many cases of using lots of FE in cross-section data, and I think it is *really* hard to defend non-linear models in cases such as these. FE account for variation in the data in a completely general way -- the non-linear model relies on the functional form.

Also, your monte carlo example is a bit of a cherry pick; the LPM is the "wrong" model in this case. MLE will be inconsistent if the probit model is wrong, too.

1. Scott: Thanks for the comments. First one - fair enough.

Second one - Yes, MLE will also be inconsistent if the Probit DGP is wrong. However, other work I've played around with shows that the asymptotic bias associated with the LPM is often greater than that associated with an incorrect nonlinear model. For example, if the data are generated according to Probit, and then we fit either LPM or Logit.

This is something I'm currently working on more seriously - see one of my responses at http://davegiles.blogspot.ca/2012/07/more-comments-on-use-of-lpm.html#comment-form

This certainly deserves proper investigation.

6. This comment has been removed by a blog administrator.

7. Hi Dave,
What about when you have many interacted independent variables and a binary dependent variable? (for example interacting many independent variables with a set of dummies) Given that calculating marginal effects of interactions is complex when there are so many. Could you be justified in using LPM in this situation?
Rose

1. Rose - I have some sympathy with that, but there's a good literature on how to do things properly, even in that case. For example:

http://www.unc.edu/~enorton/AiNorton.pdf

and

http://www.sciencedirect.com/science/article/pii/S0165176510000777

8. Dave, if I use a logistic link function but minimize MSE instead of using MLE, do I still have the same problems?

9. Personally, I'd see more sense in that than just using OLS.

1. Thanks. A follow-up if I could...

If I use the logistic link function, then maximizing (1-a)*(1-p) + ap (MLE?) in the binary response case seems to be identical to minimizing absolute error. Is this true?

10. How did your estimates of B2 turn out?

11. Trying simulatating a model in which there is a true "absolute treatment effect", e.g.

y = a0 + a1*x + e in which e ~ bernoulli

Then run LPM and logit.

Better yet add a covariate to the equation above (e.g. a2*w) and show how logit will suggest that "a1" varies with w when it actually doesn't.

12. So LPM estimates are inconsistent for the ME if the true model is PROBIT ...And the probit estimator should be preferred if the DGP is Probit.... BIG SURPRISE THX Prof!

1. Thanks for your sarcastic comment. You'd be better off directing your energy to berating those of your colleagues who keep ignoring this fundamental point in their applied work. Goodness knows there are enough of them around.

13. Very useful posting since I am now dealing with binary dependent variable. I am still learning your another post about robust standard error for Probit and Logit. Very helpful!! Tony

1. Great posting! I have a quick question for you. What if there is no a priori reason for preferring the probit model (e.g., we are not doing a simulation and not knowing it to be the true model)..how can we choose between the probit model and the logit model?

2. I'd discriminate between the two using one the of the available information criteria. A useful paper on this is: G. Chen & H. Tsurumi, "Probit and Logit MOdel Selection", Communications in Statisics - Theory & Methods, 2010, 40, 159-175. Here's the abstract:

Abstract:
"Monte Carlo experiments are conducted to compare the Bayesian and sample theory model selection criteria in choosing the univariate probit and logit models. We use five criteria: the deviance information criterion (DIC), predictive deviance information criterion (PDIC), Akaike information criterion (AIC), weighted, and unweighted sums of squared errors. The first two criteria are Bayesian while the others are sample theory criteria. The results show that if data are balanced none of the model selection criteria considered in this article can distinguish the probit and logit models. If data are unbalanced and the sample size is large the DIC and AIC choose the correct models better than the other criteria. We show that if unbalanced binary data are generated by a leptokurtic distribution the logit model is preferred over the probit model. The probit model is preferred if unbalanced data are generated by a platykurtic distribution. We apply the model selection criteria to the probit and logit models that link the ups and downs of the returns on S&P500 to the crude oil price."

14. Although the sarcasm of Anonymous's June 10 comment isn't warranted, I have to agree with the general sentiment--your simulation isn't very fair since it is prima facie mis-specified for all models except the probit. There is a data generating model that yields the linear probability model (or, more exactly, a Bernoulli glm with a linear link). Maybe you should be comparing to that. See http://stats.stackexchange.com/questions/81789/if-%CF%B5-is-uniformly-distributed-then-a-linear-probability-model-is-appropriate-c