Sunday, November 18, 2012

Assessing Heckman's Two-Step Estimator

Good survey papers are worth their weight in gold. Reading and digesting a thoughtful, constructive, and well-researched survey can save you a lot of work. It can also save you from making poor choices in your own research, or even from "re-inventing the wheel".

For these reasons, The Journal of Economic Surveys is a great resource. Over the years it has published some really fine peer-reviewed survey articles, many of which I've benefited from personally.

Another piece of good news is that Wiley (the journal's publisher) makes a number of the most highly-cited articles available for free.

A great example of such a survey is "The Heckman Correction for Sample Selection and its Critique", by Patrick A. Puhani, published  in 2000. You can access Patrick's paper freely here.

Heckman's two-step estimator is very widely used in microeconometrics. However, as this survey of the literature that assesses the estimator's merits shows, choosing it is not always advisable. Table 1 of the paper puts everything together in an easily accessible way, and I'm going to quote the overall conclusions almost in their entirety:
"The general conclusions which may be drawn from the surveyed Monte Carlo studies as well as the theoretical considerations cast doubt on the omnipotence implicitly ascribed by many applied researchers to Heckman's (1976, 1979) two-step estimator. Indeed, Heckman himself is confirmed when he writes that the purpose of his estimator is only to 'provide good starting values for maximum likelihood estimation' and 'exploratory empirical work.' (Heckman, 1979, p. 160). 
The cases where the need to correct for selectivity bias are largest are those with a high correlation between the error terms of the selection and the outcome equation, and those with a high degree of censoring. Unfortunately, though, as the Monte Carlo analyses show, in exactly those cases Heckman's estimator is particularly inefficient and subsample OLS may therefore be more robust. In addition, empirical researchers are often confronted with a high correlation between the exogenous variables in the selection and the- outcome equation. Because the inverse Mills ratio is approximately linear over wide ranges of its argument, such high correlation is likely to make Heckman's LIML, but also the FIML estimator very unrobust due to the collinearity between the inverse Mills ratio and the other regressors.
The practical advice one may draw from these results, for example for the estimation of empirical wage equations, is that the estimation method should be decided upon case by case. A first step should be to investigate whether there are collinearity problems in the data. This can be done by calculating R2 of the regression of the inverse Mills ratio on the regressors of the outcome equation or by calculating the corresponding condition number........ If collinearity problems are present, subsample OLS (or the Two-Part Model) may be the most robust and simple-to-calculate estimator. If there are no collinearity problems, Heckman's LIML estimator may be employed, but given the constant progress in computing power, the FIML estimator is recommended, as it is usually more efficient than the LIML estimator."
(Puhani, 2000, pp. 64-65.)

This all sounds like good advice, and I'm not aware of any recent developments in this literature that suggest otherwise.


Heckman, J. J., 1976. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic Social Measurement, 5, 4, 475-492.

Heckman, J. J., 1979. Sample selection bias as a specification error. Econometrica, 47, l, 153-161.

Puhani, P. A., 2000. The Heckman correction for sample selection and its critique. Journal of Economic Surveys, 14, 1, 53-68.

© 2012, David E. Giles

1 comment:

  1. Thanks, man. Very helpful. Keep it up