Econometrics Beat: Dave Giles' Blog: Another Gripe About the Linear Probability Model

Friday, June 1, 2012

Another Gripe About the Linear Probability Model

NOTE: This post was revised significantly on 15 February, 2019, as a result of correcting an error in my original EViews code. The code file and the Eviews workfile that are available elsewhere on separate pages of this blog were also revised. I would like to thank Frederico Belotti for drawing my attention to the earlier coding error.

So you're still thinking of using a Linear Probability Model (LPM) - also known in the business as good old OLS - to estimate a binary dependent variable model?

Well, I'm stunned!

Yes, yes, I've heard all of the "justifications" (excuses) for using the LPM, as opposed to using a Logit or Probit model. Here are a few of them:

It's computationally simpler.
It's easier to interpret the "marginal effects".
It avoids the risk of mis-specification of the "link function".
There are complications with Logit or Probit if you have endogenous dummy regressors.
The estimated marginal effects from the LPM, Logit and Probit models are usually very similar, especially if you have a large sample size.

Oh really? That don't impress me much!

Why not? Well, in almost all circumstances, the LPM yields biased and inconsistent estimates. You didn't know that? Then take a look at the paper by Horrace and Oaxaca (2006), and some previous results given by Amemiya (1977)!

It's not the bias that worries me in this particular context - it's the inconsistency. After all, the MLE's for the Logit and Probit models are also biased in finite samples - but they're consistent. Given the sample sizes that we usually work with when modelling binary data, it's consistency and asymptotic efficiency that are of primary importance.

Why would you choose to use a modelling/estimation strategy that will give you the wrong answer, with probability one, even if you have an infinitely large sample? Wouldn't you rather use one that will give you the right answer, with certainty, if the sample size is very, very large?

If you're not feeling up to reading the Horrace and Oaxaca paper - bad news is always unpleasant - here's their key result. Let x_i denote the i^th observation on the vector of covariates, and let β be the vector of associated coefficients. So, x_iβ is the true value of the i^th element of the "prediction vector".

OLS estimation of the Linear Probability Model will be both biased and inconsistent, unless it happens to be the case that 0 ≤ x_iβ ≤ 1, for every i.

How likely is that? In addition, notice that x_iβ isn't observed, in practice.

The bottom line, in their words (H & O, p.326):

"Although it is theoretically possible for OLS on the LPM to yield unbiased estimation, this generally would require fortuitous circumstances. Furthermore, consistency seems to be an exceedingly rare occurrence as one would have to accept extraordinary restrictions on the joint distribution of the regressors. Therefore, OLS is frequently a biased estimator and almost always an inconsistent estimator of the LPM."

Perhaps a small Monte Carlo simulation experiment will be helpful?

Here's what I've done. I generated binary data for my dependent variable, y, as follows:

y_i* = β₁ + β₂x_i + ε_i; ε_i ~ iid N[0 , 1]

y_i = 1 ; if y_i* > 0

= 0 ; if y_i* ≤ 0 ; i = 1, 2, ...., n

Then, I estimated both a Probit model and a LPM, using the y and x data. In each case I computed the estimated marginal effect for x. (In the case of the LPM, this is just the OLS estimate of the coefficient of x.) Of course, as I know the true values of β₁ and β₂ in the data-generating process, I also know the true value of the marginal effect. For the Probit model I calculated the marginal; effect at the sample mean of x. An alternative would have been to have averaged the partial effects, computed at each sample point. This is something I'll write a post about at a later date.

I've focused on marginal effects because they are more interesting than the parameters themselves in a Logit or Probit model. In addition, unlike the parameter estimates, they can be compared meaningfully between the LPM and Logit or Probit models.

For a fixed sample size, n, I replicated this 5,000 times. Various sample sizes were considered.

The EViews workfile and program file are on the code page for this blog. (I'm sorry that I didn't do this one in R! However, the EViews program file is also there as a text version of it for ease of viewing.)

Here are some summary results (Probit followed by LPM), first for n = 100

Comparing the means of the sampling distributions (0.6115 and 0.3197) for the estimated marginal effects with the true value of 0.6051, you can see that the LPM (OLS) result is biased downwards, whereas the Probit result has virtually no bias.

Remember, this is for a sample size of n = 100. Now let's increase the sample size to n = 250:

Notice that the mean of the sampling distribution for the Probit-estimated marginal effect (0.5228) is very close to the true value of 0.5263. In contrast, the OLS/LPM estimator of the marginal effect is biased upwards (being 0.3241, on average).

Now, let's explore the asymptotics a little by considering n = 5,000, and n = 10,000:

We can see that the marginal effect implied by OLS estimation of the LPM is converging to a value of approximately 0.32, as n increases. The standard deviation of the sampling distribution for that estimator is getting smaller and smaller. The inconsistency of the estimator is being driven by asymptotic bias.

On the other hand, the marginal effect implied by maximum likelihood estimation of the Probit model is converging to the true marginal effect - even though the latter value is changing as n increases. In fact, this convergence is quite rapid. The (small) bias vanishes, and the standard deviation of the estimator continues to decrease, as the sample size grows.

So much for the last of the "reasons" for using the LPM in the bulleted list near the start of this post!

When we look at the proportion of sample values for which the true value of (β₁ + β₂x_i) lies in the unit interval, the results for n = 100, 250, 5000, and 10000 are 20%, 15.2%, 19.2%, and 19.1% respectively. These values are a long way from 100%.

So, recalling Horace and Oaxaca's results, that's why OLS estimation of the Linear Probability Model is both biased and inconsistent, here. The maximum likelihood estimator is consistent, and its bias is quite small in finite samples, in this experiment.

Yes, there are some situations where I'd consider using an LPM. For example, with a complex panel data-set, or when there are endogenous dummy covariates. However, in the typical case of cross-section data, and no other complications, you'll have to work pretty hard to convince me!

References

Amemiya, T., 1977. Some theorems in the linear probability model. International Economic Review, 18, 645–650.

Horrace, W. C. & R. L. Oaxaca, 2006. Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics Letters, 90, 321-327. (WP version here.)

24 comments:

Brian FergusonJune 1, 2012 at 11:23 AM
Dave: Can you suggest a reference for the endogenous dummy covariate case?
ReplyDelete
Replies
DimitriyJune 1, 2012 at 11:36 AM
Would you still do this if heteroscedasticity was an issue?
ReplyDelete
Replies
Alan MehlenbacherJune 2, 2012 at 6:52 PM
This post has some very useful information about LPM. Thanks for the effort you put into this and your other interesting (and often entertaining!) blog posts.

I try to do my bit by being as DEMONSTRATIVE as possible in telling my Econ 345 students why they must use probit or logit instead of LPM.
ReplyDelete
Replies
Dave GilesJune 2, 2012 at 11:14 PM
Alan: Thanks! Glad it was helpful.
ReplyDelete
Replies
Scott BJuly 18, 2012 at 10:30 AM
Well, Dave, you mention that panel data you would consider the use of LPM, but not cross-section... but there are many cases of using lots of FE in cross-section data, and I think it is *really* hard to defend non-linear models in cases such as these. FE account for variation in the data in a completely general way -- the non-linear model relies on the functional form.

Also, your monte carlo example is a bit of a cherry pick; the LPM is the "wrong" model in this case. MLE will be inconsistent if the probit model is wrong, too.
ReplyDelete
Replies
UnknownSeptember 6, 2012 at 10:07 AM
Hi Dave,
What about when you have many interacted independent variables and a binary dependent variable? (for example interacting many independent variables with a set of dummies) Given that calculating marginal effects of interactions is complex when there are so many. Could you be justified in using LPM in this situation?
Rose
ReplyDelete
Replies
AnonymousJanuary 11, 2013 at 8:32 PM
Dave, if I use a logistic link function but minimize MSE instead of using MLE, do I still have the same problems?
ReplyDelete
Replies
Dave GilesJanuary 12, 2013 at 10:07 AM
Personally, I'd see more sense in that than just using OLS.
ReplyDelete
Replies
AnonymousFebruary 5, 2013 at 11:54 AM
How did your estimates of B2 turn out?
ReplyDelete
Replies
AnonymousFebruary 6, 2013 at 5:48 AM
Trying simulatating a model in which there is a true "absolute treatment effect", e.g.

y = a0 + a1*x + e in which e ~ bernoulli

Then run LPM and logit.

Better yet add a covariate to the equation above (e.g. a2*w) and show how logit will suggest that "a1" varies with w when it actually doesn't.
ReplyDelete
Replies
AnonymousNovember 1, 2014 at 1:11 AM
Very useful posting since I am now dealing with binary dependent variable. I am still learning your another post about robust standard error for Probit and Logit. Very helpful!! Tony
ReplyDelete
Replies
AnonymousDecember 12, 2014 at 9:13 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Federico BelottiFebruary 14, 2019 at 8:03 AM
Dear Dave,

I'd like to jump in here, even though this is an old thread.
In particular, I'd like to ask for two clarifications on the Eviews code you used for the Monte Carlo analysis. I might be wrong or missing something since I don't know the Eviews syntax very well but it seems to me that marginal effect of x (at means) in a probit model should be

@dnorm(c(1)+c(2)*@mean(x))*c(2)

instead of

@cnorm(c(1)+c(2)*@mean(x))*(1-@cnorm(c(1)+c(2)*@mean(x)))*c(2)

Am I wrong?
Second: Why did you consider the marginal effect at mean instead of the average marginal effect? How the regressor x is generated? I wasn't able to find it looking at the code.

Many thanks,
Federico

ReplyDelete
Replies
AnonymousMarch 29, 2019 at 11:36 AM
There is no such thing as the "true marginal effect", because the marginal effect depends on x. It is the researcher who chooses to calculate it for the mean x, but why should we care about the derivative at this particular point? This is just one possible x, no better and no worse than any other. LPM (OLS) gives you a weighted-average of marginal effects at different values of x. Of course it will be a different number! (And even more if the distribution of x is ugly). This is like an apples-to-oranges comparison. But this is not a problem of OLS per se, it is a problem of the choice of mean x as the point where marginal effect was calculated. A somewhat fairer test could be to at least calculate the average partial effect of MLE and then compare it to OLS - these are the two competing ways of aggregating marginal effects into one single parameter of interest. OLS can be seen as a more convenient one, especially since MLE (and hence average partial effects) relies on untestable distributional assumptions to identify the parameter of interest: here you even simulated a normal epsilon and selectively picked a model that assumes a normal epsilon, but in real data we would never know...
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Friday, June 1, 2012

Another Gripe About the Linear Probability Model

24 comments: