Thursday, June 4, 2015

Logit, Probit, & Heteroskedasticity

I've blogged previously about specification testing in the context of Logit and Probit models. For instance, see here and here

Testing for homoskedasticity in these models is especially important, for reasons that are outlined in those earlier posts. I won't repeat all of the details here, but I'll just note that heteroskedasticity renders the MLE of the parameters inconsistent. (This stands in contrast to the situation in, say, the linear regression model where the MLE of the parameters is inefficient, but still consistent in this case.)

If you're an EViews user, you can find my code for implementing a range of specification tests for Logit and Probit models here. These include the LM test for homoskedasticity that was proposed by Davidson and MacKinnon (1984).

More than once, I've been asked the following question:
"When estimating a Logit or Probit model, we set the scale parameter (variance) of the error term to the value one, because it's not actually identifiable. So, in what sense can we have heteroskedasticity in such models?"
This is a good question, and I thought that a short post would be justified. Let's take a look:

We'll consider a fairly standard way of motivating a binary choice model.

First, we'll assume that there is some unobserved ("latent") variable, y*, that is a linear function of some regressors ("covariates") and parameters:

               y* = x'β + σ ε ,                                                                             (1)

where ε follows some distribution, F(.), with E(ε) = 0.

You'll recall that if F(.) = Φ(.), the standard normal cumulative distribution function (c.d.f.) then we get the Probit model. Similarly, if F(.) = Λ(.), the c.d.f. for the logistic distribution, then we get the Logit model.

What we actually observe (assign) is a value for an indicator variable, y, with:

            y = 1     if   y* > 0

            y = 0      if   y* ≤ 0 .                                                                       (2)

(The zero threshold for y* is arbitrary, as long as the regressors include an intercept "variable"; and the zero-one values for y are not important. They are used simply to partition the sample into two parts.)

Also, we can arbitrarily set σ = 1 for the following reason. We can re-write equation (1) as:

          ( y* / σ) = x' (β / σ) +  ε .                                                                 (3)

Then, notice that (because the standard deviation, or "scale parameter", σ, must be positive), the sign of y* will be the same as that of  ( y* / σ). So, regardless of whether we use (1) or (3), the observed variable, y, will take its zero-one values in exactly the same way. It won't matter is we have σ = 1, σ = 10, or any other value. The situation will be the same. In other words, the parameter, σ, can't be identified in this model. We might as well set it to a value of one.

From equation (2), note that, if F(.) is a symmetric distribution, as is the case if it is normal or logistic, then:

Pr[y = 1] = Pr[y* > 0 | x] = Pr[ε > - x'β | x] = F(x'β)


Pr[y = 0] = Pr[y* ≤ 0 | x] = Pr[ε ≤ -x'β | x] = 1 - F(x'β) .

Assuming independence in the sample, the log-likelihood function takes the form:

           L = Σ{yilog(F(xi'β)) + (1 - yi)log[1 - (F(xi'β))]} ;                     (4)

where the range of summation is from 1 to n, the sample size; and all logarithms are "natural" ones. The (row) vector, xi', has elements that are the ith observation on each of the regressors.

So, how can we have a model that is heteroskedastic if the scale parameter, σ, doesn't even appear in the likelihood function, because it is not identifiable?

To answer this question, let's consider a more general specification of the model. To make the form of the error structure concrete, suppose that F(.) = Φ(.), so we are talking about a probit model. Then, we might consider a generalization of (1), such as:
           y* = x'β + ε   ;   εi ~ N[0 , exp(zi'γ)]  .                                            (5)

The (row) vector, zi', has elements that are the ith observation on each of some exogenous variables; and γ is a conformable vector of unknown parameters. Some of the elements of z may also be elements of x.

The use of the exponential function ensures that the variance is positive, regardless of the signs or values of the elements of zi and γ. To be sure that all of the elements of both β and γ can be identified, we mustn't have a constant "variable" among the z's. Notice that if γ = 0, then we are back to the (homoskedastic) model in (1), with σ = 1.

In this case, the log-likelihood function takes the form:

            L = Σ{yilog(Φ(xi'β / √(exp(zi'γ)))) + (1 - yi)log[1 - (Φ(xi'β / √(exp(zi'γ))))]} .   (6)

In particular, the log-likelihood function, (6), now includes the parameters that make up the elements of γ.

Now we have some intuition as to why the MLE of beta will be inconsistent if we ignored the heteroskedasctiticy and simply maximized the log-likelihood in equation (4).

By working with the usual log-likelihood function, (4), we'd be incurring a specification error. In fact, you can see that the nature of the specification error is essentially that of omitting relevant effects. As usual, this leads to an inconsistent estimator of the parameters when we apply MLE. 

Although this discussion has been in terms of a basic Logit or Probit model, the same general points apply to generalizations of these models (such as multinomial or nested Logit), to the Tobit model, and to some other related models.

If you need additional references on the topic of specification testing in the context of discrete choice/limited dependent variable models, you'll find a whole bunch of them on my principal website.


Davidson, R. & J. G. MacKinnon, 1984. Convenient specification tests for logit and probit models. Journal of Econometrics, 25, 241 262.

© 2015, David E. Giles


  1. There is one log too much in equations (4) and (6). I think the last term should be: ....+(1 - yi) log[1 - F(xi'β)]}? What do you think?

    1. Thanks for that - all fixed! (Bad "copy and paste on my part.)

  2. Hello, thanks for very interesting blog :) I would like to ask short question regarding log-likelihood in equation 6. Should not there be x*beta over square root of exponential function? It does not change any intuition, but just for technicality. Thanks in advance!

    1. Ketevani - thanks for spotting that :-) Now fixed. DG