## Saturday, June 25, 2016

### Choosing Between the Logit and Probit Models

I've had quite a bit say about Logit and Probit models, and the Linear Probability Model (LPM), in various posts in recent years. (For instance, see here.) I'm not going to bore you by going over old ground again.

However, an important question came up recently in the comments section of one of those posts. Essentially, the question was, "How can I choose between the Logit and Probit models in practice?"

I responded to that question by referring to a study by Chen and Tsurumi (2010), and I think it's worth elaborating on that response here, rather than leaving the answer buried in the comments of an old post.

So, let's take a look.

Putting the LPM entirely to one side (where,as far as I'm concerned, it rightly belongs!), the issue is whether a standard normal distribution, or a logistic distribution, is the better choice when it comes to modelling the link between our discrete dependent variable and the regressors (covariates). If we choose the normal distribution we end up with the so-called Probit model; and if we choose the logistic distribution we end up with the Logistic model.

Let's begin by asking, "how much are the results likely to differ when we make one of these choices or the other?"

The short answer is, "not very much, in general." So, this may seem to suggest that we can basically flip a coin when it comes to deciding whether to go the Logit route or the Probit route. However, it's not quite that simple.

Why not?

First,  the answer given above relates to the simple case where we have a binomial Logit or Probit model. That is, there are only two discrete choices for our qualitative variable. As soon as we move to the multinomial case, where there are three or more choices, the story changes fundamentally. In particular, the multinomial Logit  model is computationally simpler to implement than is the multinomial Probit model, and this may factor into our choice. On the other hand, there is the well-known problem associated with the "Independence of Irrelevant Alternatives" that arises with the multinomial Logit model, but not with the multinomial Probit model. So there are pros and cons when it comes to making this choice in the multinomial case.

Second, even when we restrict ourselves to the standard binomial (zero-one) case, there can be some marked differences between Logit and Probit results when we focus on the tails of the underlying distributions (e.g., Cox, 1966).

So, it's still interesting to think about whether we can come up with some formal statistical procedure to help us to decide between the Logit and Probit models, when we have the same (limited) dependent variable.

These two models are "non-nested", so a natural way to proceed is to use some information criterion or other to discriminate between them. This applies whether we're talking about a binomial model or a multinomial model. Note that this is not an example of hypothesis testing. Rather, we're effectively "ranking" the Probit and Logit models. (For some general comments about the use of information criteria in other contexts, see my earlier posts here and here.)

One of the few studies to evaluate the effectiveness of alternative information criteria to discriminate between Logit and Probit models is that by Chen and Tsurumi (2010). They consider five different criteria, namely:

1. The deviance information criterion (DIC).
2. The predictive deviance information criterion (PDIC).
3. The unweighted sum of squared errors (USSE).
4. The weighted sum of squared errors (WSSE).
5. Akaike's information criterion (AIC).
The main conclusions emerging from the Chen-Tsurumi paper are as follows, and they aren't all that encouraging:
• If the binary data that are being modelled are "balanced" (i.e., there is roughly a 50-50 split between the zero and one values), then none of the above information criteria are very effective at discriminating properly between the Logit and Probit models.
• If the data are "unbalanced", then only the DIC and AIC criteria are effective.
• The more information that is available about the higher moments of the underlying distribution of the binary data, the more effective are these criteria in the "unbalanced" case.
• Sample sizes of at least 1,000 or more are needed to be able to discriminate between the Logit and Probit models using this approach.
If these information criteria don't help us very much, is there some other way to choose between the Logit and Probit specifications?

Another option is to think of using a classical hypothesis test. As I noted above, the two models are non-nested, and this has to be taken into account. This approach is followed, for example, by Chambers and Cox (1967). First, they take the Logistic specification as the null hypothesis, and seek a power-maximizing test against the Probit alternative. Then they construct a test with the null and alternative hypotheses reversed.

Once again, the author's simulation experiments are not particularly encouraging, and relatively large sample sizes are needed for the tests to have appreciable power.

In summary, if you're really concerned about discriminating/selecting between the Logit and Probit models, then there are some tools that are available, but they are only modestly effective.

There's certainly some room for more research into this topic.

References

Chambers, E. A. and D. R. Cox, 1967. Discrimination between alternative binary response models. Biometrika, 54, 573–578.

Chen, G. and H. Tsurumi, 2010. Probit and logit model selection. Communications in Statistics - Theory and Methods, 40, 159-175.

Cox, D. R., 1966. Some procedures connected with the logistic qualitative response curve. In Research Papers in Statistics: Festschrift for J. Neyman (F. N. David, ed.), Wiley, London.