So you're still thinking of using a Linear Probability Model (LPM) - also known in the business as good old OLS - to estimate a binary dependent variable model?
Well, I'm stunned!
Yes, yes, I've heard all of the "justifications" (excuses) for using the LPM, as opposed to using a Logit or Probit model. Here are a few of them:
- It's computationally simpler.
- It's easier to interpret the "marginal effects".
- It avoids the risk of mis-specification of the "link function".
- There are complications with Logit or Probit if you have endogenous dummy regressors.
- The estimated marginal effects from the LPM, Logit and Probit models are usually very similar, especially if you have a large sample size.
Oh really? That don't impress me much!
Why not? Well, in almost all circumstances, the LPM yields biased and inconsistent estimates. You didn't know that? Then take a look at the paper by Horrace and Oaxaca (2006), and some previous results given by Amemiya (1977)!
It's not the bias that worries me in this particular context - it's the inconsistency. After all, the MLE's for the Logit and Probit models are also biased in finite samples - but they're consistent. Given the sample sizes that we usually work with when modelling binary data, it's consistency and asymptotic efficiency that are of primary importance.
Why would you choose to use a modelling/estimation strategy that will give you the wrong answer, with probability one, even if you have an infinitely large sample? Wouldn't you rather use one that will give you the right answer, with certainty, if the sample size is very, very large?
If you're not feeling up to reading the Horrace and Oaxaca paper - bad news is always unpleasant - here's their key result. Let xi denote the ith observation on the vector of covariates, and let β be the vector of associated coefficients. So, xiβ is the true value of the ith element of the "prediction vector".
OLS estimation of the Linear Probability Model will be both biased and inconsistent, unless it happens to be the case that 0 ≤ xiβ ≤ 1, for every i.
How likely is that? In addition, notice that xiβ isn't observed, in practice.
The bottom line, in their words (H & O, p.326):
"Although it is theoretically possible for OLS on the LPM to yield unbiased estimation, this generally would require fortuitous circumstances. Furthermore, consistency seems to be an exceedingly rare occurrence as one would have to accept extraordinary restrictions on the joint distribution of the regressors. Therefore, OLS is frequently a biased estimator and almost always an inconsistent estimator of the LPM."
Perhaps a small Monte Carlo simulation experiment will be helpful?
Here's what I've done. I generated binary data for my dependent variable, y, as follows:
yi* = β1 + β2xi + εi ; εi ~ iid N[0 , 1]
yi = 1 ; if yi* > 0
= 0 ; if yi* ≤ 0 ; i = 1, 2, ...., n
Then, I estimated both a Probit model and a LPM, using the y and x data. In each case I computed the estimated marginal effect for x. (In the case of the LPM, this is just the OLS estimate of the coefficient of x.) Of course, as I know the true values of β1 and β2 in the data-generating process, I also know the true value of the marginal effect.
I've focused on marginal effects because they are more interesting than the parameters themselves in a Logit or Probit model. In addition, unlike the parameter estimates, they can be compared meaningfully between the LPM and Logit or Probit models.
For a fixed sample size, n, I replicated this 5,000 times. Various sample sizes were considered.
The EViews workfile and program file are on the code page for this blog. (I'm sorry that I didn't do this one in R! However, the EViews program file is also there as a text file for ease of viewing.)
Here are some summary results (Probit followed by LPM), first for n = 100
Comparing the means of the sampling distributions for the estimated marginal effects, with the true value of 0.3526, you can see that the LPM (OLS) result is biased downwards, whereas the Probit result has virtually no bias.
Remember, this is for a sample size of n = 100. Now let's increase the sample size to n = 250:
Notice that the mean of the sampling distribution for the Probit-estimated marginal effect is very close to the true value of 0.2963. In contrast, the OLS/LPM estimator of the marginal effect is biased upwards (being 0.324, on average).
Now, let's explore the asymptotics a little by considering n = 5,000, and n = 10,000:
We can see that the marginal effect implied by OLS estimation of the LPM is converging to a value of approximately 0.32, as n increases. The standard deviation of the sampling distribution for that estimator is getting smaller and smaller. The inconsistency of the estimator is being driven by asymptotic bias.
On the other hand, the marginal effect implied by maximum likelihood estimation of the Probit model is converging to the true marginal effect - even though the latter value is changing as n increases. In fact, this convergence is quite rapid. The (small) bias vanishes, and the standard deviation of the estimator continues to decrease, as the sample size grows.
When we look at the proportion of sample values for which the true value of (β1 + β2xi) lies in the unit interval, the results for n = 100, 250, 5000, and 10000 are 20%, 15.2%, 19.2%, and 19.1% respectively. These values are a long way from 100%.
So, recalling Horace and Oaxaca's results, that's why OLS estimation of the Linear Probability Model is both biased and inconsistent, here. The maximum likelihood estimator is consistent, and its bias is quite small in finite samples, in this experiment.
Yes, there are some situations where I'd consider using an LPM. For example, with a complex panel data-set, or when there are endogenous dummy covariates. However, in the typical case of cross-section data, and no other complications, you'll have to work pretty hard to convince me!
Amemiya, T., 1977. Some theorems in the linear probability model. International Economic Review, 18, 645–650.
Horrace, W. C. & R. L. Oaxaca, 2006. Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics Letters, 90, 321-327. (WP version here.)
© 2012, David E. Giles