Thursday, June 4, 2015

Logit, Probit, & Heteroskedasticity

I've blogged previously about specification testing in the context of Logit and Probit models. For instance, see here and here

Testing for homoskedasticity in these models is especially important, for reasons that are outlined in those earlier posts. I won't repeat all of the details here, but I'll just note that heteroskedasticity renders the MLE of the parameters inconsistent. (This stands in contrast to the situation in, say, the linear regression model where the MLE of the parameters is inefficient, but still consistent in this case.)

If you're an EViews user, you can find my code for implementing a range of specification tests for Logit and Probit models here. These include the LM test for homoskedasticity that was proposed by Davidson and MacKinnon (1984).

More than once, I've been asked the following question:
"When estimating a Logit or Probit model, we set the scale parameter (variance) of the error term to the value one, because it's not actually identifiable. So, in what sense can we have heteroskedasticity in such models?"
This is a good question, and I thought that a short post would be justified. Let's take a look:

We'll consider a fairly standard way of motivating a binary choice model.

First, we'll assume that there is some unobserved ("latent") variable, y*, that is a linear function of some regressors ("covariates") and parameters:

               y* = x'β + σ ε ,                                                                             (1)

where ε follows some distribution, F(.), with E(ε) = 0.

You'll recall that if F(.) = Φ(.), the standard normal cumulative distribution function (c.d.f.) then we get the Probit model. Similarly, if F(.) = Λ(.), the c.d.f. for the logistic distribution, then we get the Logit model.

What we actually observe (assign) is a value for an indicator variable, y, with:

            y = 1     if   y* > 0

            y = 0      if   y* ≤ 0 .                                                                       (2)

(The zero threshold for y* is arbitrary, as long as the regressors include an intercept "variable"; and the zero-one values for y are not important. They are used simply to partition the sample into two parts.)

Also, we can arbitrarily set σ = 1 for the following reason. We can re-write equation (1) as:

          ( y* / σ) = x' (β / σ) +  ε .                                                                 (3)

Then, notice that (because the standard deviation, or "scale parameter", σ, must be positive), the sign of y* will be the same as that of  ( y* / σ). So, regardless of whether we use (1) or (3), the observed variable, y, will take its zero-one values in exactly the same way. It won't matter is we have σ = 1, σ = 10, or any other value. The situation will be the same. In other words, the parameter, σ, can't be identified in this model. We might as well set it to a value of one.

From equation (2), note that, if F(.) is a symmetric distribution, as is the case if it is normal or logistic, then:

Pr[y = 1] = Pr[y* > 0 | x] = Pr[ε > - x'β | x] = F(x'β)


Pr[y = 0] = Pr[y* ≤ 0 | x] = Pr[ε ≤ -x'β | x] = 1 - F(x'β) .

Assuming independence in the sample, the log-likelihood function takes the form:

           L = Σ{yilog(F(xi'β)) + (1 - yi)log[1 - (F(xi'β))]} ;                     (4)

where the range of summation is from 1 to n, the sample size; and all logarithms are "natural" ones. The (row) vector, xi', has elements that are the ith observation on each of the regressors.

So, how can we have a model that is heteroskedastic if the scale parameter, σ, doesn't even appear in the likelihood function, because it is not identifiable?

To answer this question, let's consider a more general specification of the model. To make the form of the error structure concrete, suppose that F(.) = Φ(.), so we are talking about a probit model. Then, we might consider a generalization of (1), such as:
           y* = x'β + ε   ;   εi ~ N[0 , exp(zi'γ)]  .                                            (5)

The (row) vector, zi', has elements that are the ith observation on each of some exogenous variables; and γ is a conformable vector of unknown parameters. Some of the elements of z may also be elements of x.

The use of the exponential function ensures that the variance is positive, regardless of the signs or values of the elements of zi and γ. To be sure that all of the elements of both β and γ can be identified, we mustn't have a constant "variable" among the z's. Notice that if γ = 0, then we are back to the (homoskedastic) model in (1), with σ = 1.

In this case, the log-likelihood function takes the form:

            L = Σ{yilog(Φ(xi'β / √(exp(zi'γ)))) + (1 - yi)log[1 - (Φ(xi'β / √(exp(zi'γ))))]} .   (6)

In particular, the log-likelihood function, (6), now includes the parameters that make up the elements of γ.

Now we have some intuition as to why the MLE of beta will be inconsistent if we ignored the heteroskedasctiticy and simply maximized the log-likelihood in equation (4).

By working with the usual log-likelihood function, (4), we'd be incurring a specification error. In fact, you can see that the nature of the specification error is essentially that of omitting relevant effects. As usual, this leads to an inconsistent estimator of the parameters when we apply MLE. 

Although this discussion has been in terms of a basic Logit or Probit model, the same general points apply to generalizations of these models (such as multinomial or nested Logit), to the Tobit model, and to some other related models.

If you need additional references on the topic of specification testing in the context of discrete choice/limited dependent variable models, you'll find a whole bunch of them on my principal website.


Davidson, R. & J. G. MacKinnon, 1984. Convenient specification tests for logit and probit models. Journal of Econometrics, 25, 241 262.

© 2015, David E. Giles


  1. There is one log too much in equations (4) and (6). I think the last term should be: ....+(1 - yi) log[1 - F(xi'β)]}? What do you think?

    1. Thanks for that - all fixed! (Bad "copy and paste on my part.)

  2. Hello, thanks for very interesting blog :) I would like to ask short question regarding log-likelihood in equation 6. Should not there be x*beta over square root of exponential function? It does not change any intuition, but just for technicality. Thanks in advance!

    1. Ketevani - thanks for spotting that :-) Now fixed. DG

  3. Coming from a background in statistics rather than econ, I would like to share some of my thoughts on this discussion.

    For this discussion you use a latent variable specification of the probit model. If a latent variable is to be assumed, a normal random variable seems like a pretty natural choice. This latent variable specification allows you to introduce heteroscedasticity into the model as you discussed.
    The probit models tends to not be seen much outside of economics; elsewhere everyone tends to default to logit models. Of course the logit model can be given an almost identical latent variable specification: Y* = XB + e, where e is instead assumed to be logistic distributed.

    For the logit model, however, this specification is quite uncommon. Logistic distributions are a somewhat exotic distribution, they are unlikely to come about naturally in the same way as normal distribution (central limit theorem and maximal entropy considerations). Instead the logit model is usually specified as LOG-ODDS = XB. While the two specifications are mathematically identical, with the alternative specification we think of the observed responses as bernoulli random variables with varying propensities to success. No latent variable is introduced. This is a (to me at least) a much more natural specification as the logistic distribution is such an unnatural distribution. This alternative specification has an important consequence though, to introduce the same kind of heteroskedasticity as in the probit model, the link function would have to vary for different individuals. This would break the log-odds interpretation of the logit model, and so this kind of heteroscedasticity doesn't make much sense in logit models. What to make then of the results of a (in the context of the probit model) heteroscedasticity test for the logit model? In this case it doesn't indicate heteroscedasticity, but rather indicates some kind of non-linearity in the effects or other misspecification. It seems to me then that the discussion you presented here heteroscedasticity actually is a more general discussion on model misspecification. To me, the choice of model (logit/probit), and even the specification of the model (link-function/latent-variable) comes in to play in understanding how to interpret the results of model misspecification test.