Tuesday, August 9, 2011

Being Normal is Optional!

One of the cardinal rules of teaching is that you should never provide information that you know you're going to have to renege on in a later course. When you're teaching econometrics, I know that you can't possibly cover all of the details and nuances associated with key results when you present them at an introductory level. One of the tricks, though, is to try and present results in a way that doesn't leave the student with something that subsequently has to be "unlearned", because it's actually wrong.

If you're scratching your head, and wondering who on earth would be so silly as to teach something that has to be "unlearned", let me give you a really good example. You'll have encountered it a dozen times or more, I'm sure. You just have to pick up almost any econometrics textbook, at any level, and you'll come away with a big dose of mis-information regarding one of the standard assumptions that we make about the error term in a regression model. If this comes as news to you, then I'll have made my point!

Let's start off by considering a totally standard linear multiple regression model. I'll assume that the regressors are non-random (which is a little stronger than we really need, but that's irrelevant to my point), and the regressor matrix has full (column) rank. The complete set of assumptions about the random error term is that these errors have a zero mean; different values of the errors are uncorrelated with each other; they are homoskedastic (i.e., they come from a distribution with a constant variance); and they are normally distributed.

This set-up is basically a "fairy story", but that's part of what econometrics is about - seeing what happens when some of these very strong assumptions are relaxed; and seeing what can be done to recover key results when they fall apart.

By the time you've been through any basic course in econometrics, then among other things you'll have had the following results hammered into you:

  1. If all of the above assumptions (except for that of normally distributed errors) hold, then the OLS estimator of the coefficient vector is "Best Linear Unbiased". This, of course, is just the Gauss-Markhov Theorem.
  2. If all of the above assumptions, including that of normally distributed errors, are satisfied then the Gauss-Markhov Theorem still holds. However, with this extra assumption we get a stronger result. In this case the OLS estimator of the coefficient vector is "Best Unbiased". That's to say, it is efficient in the class of all unbiased estimators, and not just in the more restrictive class of estimators that are (a) linear functions of the dependent variable; and (b ) unbiased.
  3. If all of the above assumptions, including that of normally distributed errors, are satisfied then the OLS estimator of the coefficient vector in the model coincides with the maximum likelihood estimator (MLE) for this vector.
  4. The assumption that the errors in the regression model are normally distributed is what enables us to move beyond point estimation to the more important, and interesting, tasks of constructing interval estimates (confidence intervals) and tests of hypotheses for the model's parameters.
  5. More specifically, the reason why the familiar "t-statistics" and "F-statistics" follow Student's t-distribution and the F-distribution is because we make the (possibly strong) assumption that the errors of the model are normally distributed.

The last two results will be ones that you probably take as being pretty basic. Right? 

Actually - WRONG!

Before explaining why, consider result 3 above.

As it's stated, result 3 is perfectly correct. Normality of the errors (together with the other assumptions) is sufficient for the OLS estimator and the MLE for the coefficient vector to coincide. But it's not a necessary condition. You'll probably know that if the assumption of a joint (multivariate) normal distribution is replaced, in this context, with one that the errors follow a multivariate Student-t distribution, then the MLE still coincides with the OLS estimator. We've known this for a long time - at least since Zellner (1976).

Notice that this result holds, of course, if the covariance matrix of this multivariate Student-t distribution is scalar, so that the errors are uncorrelated. However, uncorrelatedness does not imply independence, in general. (It does in the case of normality, but this is a rather special case.) Here, if the Student-t errors are uncorrelated, they are not independent. In fact, the equivalence of the MLE and the OLS estimator does not arise if the errors follow independent univariate Student-t distributions.

This aside about result 3 is actually revealing. What we're seeing is that we can relax the normality assumption, at least in some ways, and still get certain key results that you may have thought required normality. In fact, we can go further than allowing multivariate Student-t errors. If the error vector follows any distribution in the family of what are called "Elliptically Symmetric" distributions, then OLS and the MLE coincide. The multivariate normal and multivariate Student- distributions are just particular members of this broader family of distributions - see Chmielewski (1981).

Now, let's go back to results 4 and 5 above. Why are they actually expressed incorrectly? You can probably guess by now, but you may be surprised to learn that there's even more to come.

Result 4 is easily dealt with. Yes, to construct confidence intervals and tests we need to know the sampling distributions of the estimators for the parameters. (Of course, we could bootstrap the sampling distributions or the test statistics, but I'll assume we're talking about exact analytic results here.) This implies we need to know, or assume, the distribution of the errors in the model.

If the errors are normal, the sampling distribution of the OLS estimator of the coefficient vector is also normal; and the sampling distribution of the usual unbiased estimator of the error variance parameter is proportional to a Chi-Square distribution. From this information we can then show that the t-test and F-test statistics are respectively Student-t and F-distributed when the null hypothesis is true; and non-central Student-t and non-central F-distributed under the alternative hypothesis.

But it's not that we have normally distributed errors, in particular, that enables us to construct tests and confidence intervals. It's simply having some specific form of distribution for the errors. Normality makes the subsequent analysis straightforward, but that's all!

So, when we look at result 5, it turns out that it's actually false. The correct result is as follows:

If the error vector in our regression model follows any distribution in the family of Elliptically Symmetric distributions, then the usual t-test and F-test statistics have the same distribution as they have when the errors are normally distributed!
That is, they are distributed as Student-t and F when the null hypothesis is true, and non-central Student-t and non-central F when the null is false.

As a specific example, if we are modelling financial data it's pretty common to reject the hypothesis of normally distributed errors in favour of errors whose distribution has fatter tails. Assuming that the errors follow a multivariate Student-t distribution is often very reasonable in such cases. If we make the latter assumption, we don't have to alter our tests for restrictions on the regression coefficient. They have exactly the same properties as usual, even in small samples.

It seems to me that this sort of result is worth knowing! And yet, as I've said already, it's not something you'll find in your typical econometrics textbook. Typically, you come away thinking that result 5 above is "the truth", and now you're going to have to "unlearn" that result. Sorry, but it's for you own benefit!

At this point, you may be thinking, "O.K., this is a new result. I wouldn't expect to find it in my textbook."

Well, I have more news for you. We've known all about this since the late 1970's! It's not exactly new! (By the way, when were you born?)

In a nutshell, my former colleague Max King proved these, and other related, results in his Ph.D. dissertation (King, 1979). His key results were published in King (1980).

Building on earlier work by Kariya (1977) and Kariya and Eaton (1977), King proved the following very general result.

If the error vector in our regression model follows any distribution in the family of Elliptically Symmetric distributions, then any test statistic that is scale-invariant has the same  null and alternative distributions as they have when the errors are normally distributed.

Because we usually don't know the scale parameter (the variance in the case of the normal distribution), we have to estimate, and consequently all of our usual test statistics are indeed scale-invariant. This is true for the t-test and F-test statistics, of course, but it's also true for all of the usual tests for serial independence, homoskedasticity, and even for normality itself!

To give a couple of specific examples - the Durbin-Watson test statistic has the same null and alternative distributions if the errors are elliptically symmetric, as it has if the errors are normal. The same is true for the Chow test for structural breaks; and for the Goldfeld-Quandt test for heterogeneity.

Ironically, an implication of this result is that any scale-invariant test of the hypothesis that the errors are normally distributed, is actually a test of the hypothesis that the errors follow an elliptically symmetric distribution!

So, one lesson from all of this is that the assumption of normally distributed errors in a regression model is not as restrictive as you might have thought. Perhaps another lesson is that you shouldn't believe everything that you read in your typical textbook!

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.


Chmielewski, M. A., 1981. Elliptically symmetric distributions: A review and bibliography. International Statistical Review, 49, 67-74.

Kariya, T., 1977. A robustness property of the test for serial correlation. Annals of Statistics, 5, 1212-1220.

Kariya, T, and Eaton, M. L., 1977. Robust tests for spherical symmetry. Annals of Statistics 5, 206-215.

King, M. L., 1979. Some aspects of statistical inference in the linear regression model. Ph.D. dissertation, Department of Economics, University of Canterbury, New Zealand.

King, M. L., 1980. Robust tests for spherical symmetry and their application to least squares regression. Annals of Statistics 8, 1265-1271.

Zellner, A.. (1976). Bayesian and non-Bayesian analysis of the regression model with multivariate Student-t error terms. Journal of the American Statistical Association, 71, 400-405.

© 2011, David E. Giles


  1. Yes - I thought it was well known that these statistics are pretty robust to violations of the normality assumption. Although from time to time referees would make the point that I hadn't tested for normality in the errors and I would simply think they didn't understand econometrics.

  2. Sinclair: Good point. Of course, there are some departures from normality that can cause problems, but there's a lot that won't. In addition, once the sample size is big enough one of the CLT's will usually take over, and none of the usual asymptotic tests require normality (or any other specific distribution). The nice thing about the results in this post is that they EXACT in FINITE samples.

    Thanks for comment.


  3. very interesting post -- thank you!!

  4. Anonymous - glad you found it helpful.


  5. You might appreciate the paper by Fraser and Ng (1980).


    They provide an alternate derivation of the main result from Zellner and then they show that the appropriate distribution for inference for the standard deviation does depend on the form of the spherical error distribution. Then they provide extensions to the comparable multivariate regression situation.

    The multivariate t distribution can be factored into conditional times marginal. See, for example, DAS Fraser 'Probability & Statistics' (1976) problem 20 page 175. Such factorizations reveal the complex nature of the lack of independence (in spite of the zero correlation).

    I am fairly sure there is a result showing that sphericity plus independence is a characterization of the normal. I cannot find it just now.

    Regression with independent t errors, while viable in principle, still seems to be held up by major numerical integration matters Computationally intensive, indeed!

  6. Gordon: Thanks for the very helpful comment, and especially the ref. to the Fraser & Ng paper.


  7. Is this applicable to a VAR model or only regression analysis? I understand that each equation in a VAR model is still estimated by OLS.


  8. Anonymous: In a VAR model you have lagged values of the dependent variable entering as regressors, so this complicates any statements about the distribtution of the t-tstats in finite samples, even if the errors are normally distributed. In large samples, though, everything becomes normal,by the central limit theorem.

  9. Hello. After reading this post and some other things, a couple of questions came up that I hope you'll be able to address sometime in your blog. Being a graduate student with really basic econometric training, I'd like to be sure of not ending up doing something grossly incorrect as an eventual practitioner of econometrics.

    The questions are:
    1. Are the finite variance and the requisite that the variables follow any distribution in the family of Elliptically Symmetric Distributions (ESD) the same requisite, or part of the same condition?
    2. Is it possible to test for this broad form of non-normality? There are many normality tests, but many of those are of low power and obviously wouldn't apply to the general case of ESD you're discussing here.
    Here (http://maint.ssrn.com/?abstract_id=1343042) Taleb attempts to show (not test) that it is highly implausible that some financial variables he chose belong to the ESD realm (as opposed to following power laws). I don't know if what he does is an adecuate enough...