Econometrics Beat: Dave Giles' Blog: Robust Standard Errors for Nonlinear Models

Wednesday, May 8, 2013

Robust Standard Errors for Nonlinear Models

André Richter wrote to me from Germany, commenting on the reporting of robust standard errors in the context of nonlinear models such as Logit and Probit. He said he 'd been led to believe that this doesn't make much sense. I told him that I agree, and that this is another of my "pet peeves"!

Yes, I do get grumpy about some of the things I see so-called "applied econometricians" doing all of the time. For instance, see my Gripe of the Day post back in 2011. Sometimes I feel as if I could produce a post with that title almost every day!

Anyway, let's get back to André's point.

The following facts are widely known (e.g., check any recent edition of Greene's text) and it's hard to believe that anyone could get through a grad. level course in econometrics and not be aware of them:

In the case of a linear regression model, heteroskedastic errors render the OLS estimator, b, of the coefficient vector, β, inefficient. However, this estimator is still unbiased and weakly consistent.
In this same linear model, and still using OLS, the usual estimator of the covariance matrix of b is an inconsistent estimators of the true covariance matrix of b. Consequently, if the standard errors of the elements of b are computed in the usual way, they will inconsistent estimators of the true standard deviations of the elements of b.
For this reason,we often use White's "heteroskedasticity consistent" estimator for the covariance matrix of b, if the presence of heteroskedastic errors is suspected.
This covariance estimator is still consistent, even if the errors are actually homoskedastic.
In the case of the linear regression model, this makes sense. Whether the errors are homoskedastic or heteroskedastic, both the OLS coefficient estimators and White's standard errors are consistent.

However, in the case of a model that is nonlinear in the parameters:

The MLE of the parameter vector is biased and inconsistent if the errors are heteroskedastic (unless the likelihood function is modified to correctly take into account the precise form of heteroskedasticity).
This stands in stark contrast to the situation above, for the linear model.
The MLE of the asymptotic covariance matrix of the MLE of the parameter vector is also inconsistent, as in the case of the linear model.
Obvious examples of this are Logit and Probit models, which are nonlinear in the parameters, and are usually estimated by MLE.

I've made this point in at least one previous post. The results relating to nonlinear models are really well-known, and this is why it's extremely important to test for model mis-specification (such as heteroskedasticity) when estimating models such as Logit, Probit, Tobit, etc. Then, if need be, the model can be modified to take the heteroskedasticity into account before we estimate the parameters. For more information on such tests, and the associated references, see this page on my professional website.

Unfortunately, it's unusual to see "applied econometricians" pay any attention to this! They tend to just do one of two things. They either

use Logit or Probit, but report the "heteroskedasticity-consistent" standard errors that their favourite econometrics package conveniently (but misleading) computes for them. This involves a covariance estimator along the lines of White's "sandwich estimator". Or, they
estimate a "linear probability model" (i.e., just use OLS, even though the dependent variable is a binary dummy variable, and report the "het.-consistent standard errors".

If they follow approach 2, these folks defend themselves by saying that "you get essentially the same estimated marginal effects if you use OLS as opposed to Probit or Logit." I've said my piece about this attitude previously (here, here, here, and here), and I won't go over it again here.

My concern right now is with approach 1 above.

The "robust" standard errors are being reported to cover the possibility that the model's errors may be heteroskedastic. But if that's the case, the parameter estimates are inconsistent. What use is a consistent standard error when the point estimate is inconsistent? Not much!!

This point is laid out pretty clearly in Greene (2012, pp. 692-693), for example. Here's what he has to say:

"...the probit (Q-) maximum likelihood estimator is not consistent in the presence of any form of heteroscedasticity, unmeasured heterogeneity, omitted variables (even if they are orthogonal to the included ones), nonlinearity of the form of the index, or an error in the distributional assumption [ with some narrow exceptions as described by Ruud (198)]. Thus, in almost any case, the sandwich estimator provides an appropriate asymptotic covariance matrix for an estimator that is biased in an unknown direction." (My underlining; DG.) "White raises this issue explicitly, although it seems to receive very little attention in the literature.".........."His very useful result is that if the QMLE converges to a probability limit, then the sandwich estimator can, under certain circumstances, be used to estimate the asymptotic covariance matrix of that estimator. But there is no guarantee the the QMLE will converge to anything interesting or useful. Simply computing a robust covariance matrix for an otherwise inconsistent estimator does not give it redemption. Consequently, the virtue of a robust covariance matrix in this setting is unclear."

Back on July 2006, on the R Help feed, Robert Duval had this to say:

"This discussion leads to another point which is more subtle, but more important...

You can always get Huber-White (a.k.a robust) estimators of the standard errors even in non-linear models like the logistic regression. However, if you believe your errors do not satisfy the standard assumptions of the model, then you should not be running that model as this might lead to biased parameter estimates.

For instance, in the linear regression model you have consistent parameter estimates independently of whether the errors are heteroskedastic or not. However, in the case of non-linear models it is usually the case that heteroskedasticity will lead to biased parameter estimates (unless you fix it explicitly somehow).

Stata is famous for providing Huber-White std. errors in most of their regression estimates, whether linear or non-linear. But this is nonsensical in the non-linear models since in these cases you would be consistently estimating the standard errors of inconsistent parameters.

This point and potential solutions to this problem is nicely discussed in Wooldrige's Econometric Analysis of Cross Section and Panel Data."

Amen to that!

Regrettably, it's not just Stata that encourages questionable practices in this respect. These same options are also available in EViews, for example.

Reference

Greene, W. H., 2012. Econometric Analysis. Prentice Hall, Upper Saddle River, NJ.

35 comments:

JohnMay 8, 2013 at 1:05 PM
This post focuses on how the MLE estimator for probit/logit models is biased in the presence of heteroskedasticity. Assume you know there is heteroskedasticity, what is the best approach to estimating the model if you know how the variance changes over time (is there a GLS version of probit/logit)? Is this also true for autocorrelation?
ReplyDelete
Replies
Jonah B. GelbachMay 8, 2013 at 5:24 PM
In characterizing White's theoretical results on QMLE, Greene is of course right that "there is no guarantee the the QMLE will converge to anything interesting or useful [note that the operative point here isn't the question of convergence, but rather the interestingness/usefulness of the converged-to object]."

But it is not crazy to think that the QMLE will converge to something like a weighted average of observation-specific coefficients (how crazy it is surely depends on the degree of mis-specification--suppose there is epsilon deviation from a correctly specified probit model, for example, in which case the QMLE would be so close to the MLE that sample variation would necessarily dominate mis-specification in any real-world empirical application). It would be a good thing for people to be more aware of the contingent nature of these approaches.

If, whenever you use the probit/logit/whatever-MLE, you believe that your model is perfectly correctly specified, and you are right in believing that, then I think your purism is defensible. If that's the case, then you should be sure to use every model specification test that has power in your context (do you do that? does anyone?).
ReplyDelete
Replies
Jorge LaraMay 9, 2013 at 3:17 AM
Dear David, would you please add the links to your blog when you discuss the linear probability model. You said "I've said my piece about this attitude previously (here and here), and I won't go over it again here."

But on here and here you forgot to add the links.

Thanks for that
ReplyDelete
Replies
NicolasMay 9, 2013 at 3:56 AM
Dave -- there's a section in Deaton's Analysis of Household Surveys on this that has always confused me. He discusses the issue you raise in this post (his p. 85) and then goes on to say the following (pp. 85-86):

"The point of the previous paragraph is so obvious and so well understood that
it is hardly of practical importance; the confounding of heteroskedasticity and "structure" is unlikely to lead to problems of interpretation. It is standard procedure in estimating dichotomous models to set the variance in (2.38) to be unity,
and since it is clear that all that can be estimated is the effects of the covariates on the probability, it will usually be of no importance whether the mechanism works through the mean or the variance of the latent "regression" (2.38). While it is
correct to say that probit or logit is inconsistent under heteroskedasticity, the
inconsistency would only be a problem if the parameters of the function f were
the parameters of interest. These parameters are identified only by the homoskedasticity assumption, so that the inconsistency result is both trivial and obvious."

I understand why we normalise the variance to 1, but I've never really understood Deaton's point as to why this make the inconsistency result under heteroskedasticity "trivial" (he then states the same issue is more serious in, for instance, a tobit model). Do you perhaps have a view? (You can find the book here, in case you don't have a copy: http://documents.worldbank.org/curated/en/1997/07/694690/analysis-household-surveys-microeconometric-approach-development-policy)

Thanks for your blog posts, I learn a lot from them and they're useful for teaching as well.
ReplyDelete
Replies
DLM@SMUMay 9, 2013 at 6:44 AM
Two comments. First, while I have no stake in Stata, they have very smart econometricians there. I would not characterize them as "encouraging" any practice. They provide estimators and it is incumbent upon the user to make sure what he/she applies makes sense.

Second, there is one situation I am aware of (albeit not an expert) where robust standard errors seem to be called for after probit/logit and that is in the context of panel data. Wooldridge discusses in his text the use of a "pooled" probit/logit model when one believes one has correctly specified the marginal probability of y_it, but the likelihood is not the product of the marginals due to a lack of independence over time. Here, I believe he advocates a partial MLE procedure using a pooled probit model, but using robust standard errors.
ReplyDelete
Replies
marcelMay 9, 2013 at 7:20 AM
I've said my piece about this attitude previously (here and here)

You bolded, but did not put any links in this line. As it stands, it appears that you have not previously expressed yourself about this attitude. If you indeed have, please correct this so I can easily find what you've said.

Thanks
ReplyDelete
Replies
Dave GilesMay 9, 2013 at 8:45 AM
DLM - thanks for the good comments. You'll notice that the word "encouraging" was a quote, and that I also expressed the same reservation about EViews. I do worry a lot about the fact that there are many practitioners out there who treat these packages as "black boxes". It's hard to stop that, of course. Regarding your second point - yes, I agree. Thanks!
ReplyDelete
Replies
nottrampisMay 9, 2013 at 3:19 PM
David,

I do trust you are getting some new readers downunder and this week I have spelled your name correctly!!
ReplyDelete
Replies
DanMay 9, 2013 at 3:38 PM
Great post! Grad student here. I like to consider myself one of those "applied econometricians" in training, and I had not considered this. Thank you, thank you, thank you. So obvious, so simple, so completely over-looked. The likelihood function depends on the CDFs, which is parameterized by the variance. An incorrect assumption about variance leads to the wrong CDFs, and the wrong likelihood function. Hence, a potentially inconsistent. How is this not a canonized part of every first year curriculum?!
ReplyDelete
Replies
edMay 9, 2013 at 3:53 PM
I'm confused by the very notion of "heteroskedasticity" in a logit model.

The model I have in mind is one where the outcome Y is binary, and we are using the logit function to model the conditional mean: E(Y(t)|X(t)) = Lambda(beta*X(t)). We can rewrite this model as Y(t) = Lambda(beta*X(t)) + epsilon(t). But then epsilon is a centered Bernoulli variable with a known variance.

Of course the assumption about the variance will be wrong if the conditional mean is mispecified, but in this case you need to define what exactly you even mean by the estimator of beta being "consistent."

What am I missing here?
ReplyDelete
Replies
AnonymousMay 10, 2013 at 2:11 PM
Dave, thanks for this very good post! I have been looking for a discussion of this for quite some time, but I could not find clear and concisely outlined arguments as you provide them here. Thanks a lot for that, even though it is a bit disheartening that so many applied econometricians should be wrong...

However, please let me ask two follow up questions:

First: in one of your related posts you mention that looking at both robust and homoskedastic standard errors could be used as a crude rule of thumb to evaluate the appropriateness of the likelihood function. That is, when they differ, something is wrong. This simple comparison has also recently been suggested by Gary King (1). If I understood you correctly, then you are very critical of this approach. Do you have an opinion of how crude this approach is? Do you have any guess how big the error would be based on this approach? It is obvious that in the presence of heteroskedasticity, neither the robust nor the homoskedastic variances are consistent for the "true" one, implying that they could be relatively similar due to pure chance, but is this likely to happen?

Second: In a paper by Papke and Wooldridge (2) on fractional response models, which are very much like binary choice models, they propose an estimator based on the wrong likelihood function, together with robust standard errors to get rid of heteroskedasticity problems. Their arguement that their estimation procedure yields consistent results relies on quasi-ML theory. While I have never really seen a discussion of this for the case of binary choice models, I more or less assumed that one could make similar arguments for them. I guess that my presumption was somewhat naive (and my background is far from sufficient to understand the theory behind the quasi-ML approach), but I am wondering why. Is there a fundamental difference that I overlooked?

Thanks a lot!

(1) http://gking.harvard.edu/files/gking/files/robust.pdf
(2) http://faculty.smu.edu/millimet/classes/eco6375/papers/papke%20wooldridge%201996.pdf
ReplyDelete
Replies
AnonymousMay 15, 2013 at 7:49 AM
You remark "This covariance estimator is still consistent, even if the errors are actually homoskedastic." (meaning, of course, the White heteroskedastic-consistent estimator). What about estimators of the covariance that are consistent with both heteroskedasticity and autocorrelation? Which ones are also consistent with homoskedasticity and no autocorrelation? I'm thinking about the Newey-West estimator and related ones. I would say the HAC estimators I've seen in the literature are not but would like to get your opinion.

I've read Greene and googled around for an answer to this question. The paper "Econometric Computing with HC and HAC Covariance Matrix Estimators" from JSS (http://www.jstatsoft.org/v11/i10/) is a very useful summary but doesn't answer the question either. I've also read a few of your blog posts such as http://davegiles.blogspot.com/2012/06/f-tests-based-on-hc-or-hac-covariance.html.

The King et al paper is very interesting and a useful check on simply accepting the output of a statistics package. However, we live with real data which was not collected with our models in mind. The data collection process distorts the data reported. Dealing with this is a judgement call but sometimes accepting a model with problems is sometimes better than throwing up your hands and complaining about the data.

Please keep these posts coming. They are very helpful and illuminating. Thanks.
ReplyDelete
Replies
Martin SandersMay 13, 2014 at 11:38 PM
Dear Professor Giles,

thanks a lot for this informative post. I think it is very important, so let me try to rephrase it to check whether I got it right: The main difference here is that OLS coefficients are unbiased and consistent even with heteroscedasticity present, while this is not necessarily the case for any ML estimates, right? And, yes, if my parameter coefficients are already false why would I be interested in their standard errors. My conclusion would be that - since heteroskedasticity is the rule rather than the exception and with ML mostly being QML - the use of the sandwich estimator is only sensible with OLS when I use real data. Am I right here?
Best wishes,
Martin
ReplyDelete
Replies
AnonymousDecember 10, 2016 at 2:05 PM
Dear Professor Giles,
Could you pease clear up the confusion in my mind: you state tate the probel is for "the case of a model that is nonlinear in the parameters" but then you also state thtat "obvious examples of this are Logit and Probit models". But Logit and Probit as linear in parameters; they belong to a class of generalized linear models.
Thank you
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Wednesday, May 8, 2013

Robust Standard Errors for Nonlinear Models

35 comments: