Tuesday, October 18, 2011

It's All in the Moments

On Thursday afternoons I'm usually to be found at the seminars organized by the Statistics group in our Department of Mathematics and Statistics. It's almost always the highlight of my week. Last week the Thursday gathering was replaced by a very special half-day event on the Friday: a mini-conference to honour the recent retirement of senior UVic statistician, Bill Reed.

Organized by Laura Cowen (UVic) and Charmaine Dean (SFU), under the auspices of the Pacific Institute of Mathematical Sciences (PIMS), the conference was a fitting tribute to Bill and his many contributions. The great turn-out was a clear indication of the many professional friends that Bill has, and the regard in which he's held.

We were treated to two outstanding keynote addresses by long-standing colleagues of Bill. The first was by Jon Schnute (adjunct at UBC, Fisheries Centre). The second was by Michael Stephens (emeritus at SFU, Statistics & Actuarial Science). Both of the presentations were informative, engaging, and very well suited to the occasion.

On particular anecdote from Michael (whose seminal contributions to goodness-of-fit testing need no introduction) caught my attention. He recalled an occasion when a rather imposing Egon Pearson (think, Neyman-Pearson Lemma) lined him up and asked:

"So, what exactly are you working on, Stephens?"
Michael responded that he was trying to determine the complete (finite-sample) distribution for a particular goodness-of-fit statistic for testing directional data. Pearson's response was:
"Why not just work out the first four moments of the distribution, and then use my father's curves to fit the distribution itself?"
Dad, of course, was Sir Karl Pearson! (We tend not to use "Pearson curves" very much these days, but there was a time when they played a major role in distribution theory. Michael subsequently published two important papers with Egon Pearson  - Pearson and Stephens, 1962, 1964).

This was good advice from Pearson Jr., because as Michael noted: if you can match the first four moments of a distribution well, then in most cases you're likely to get a really tight approximation to the full distribution.

What does this result rely on? Well, what is sometimes called "the problem of moments" tells us:
If all of the moments of a distribution exist, then knowledge of these moments is equivalent to knowledge of the distribution itself.
In other words, the moments completely define the distribution.

However, note the word, "if", in the statement of the result above. And it's a very big "if"! The problem is that for many distributions the moments exist only under certain conditions; and for some distributions some or all of the moments fail to be defined. In these cases, the "theorem" is of limited help. A sufficient condition for a distribution to completely and uniquely determined by its moments is that its moment generating function (m.g.f.) exists.

[The fact that this existence frequently fails is precisely why we usually work with the characteristic function (c.f.), rather than the m.g.f, in distribution theory. The c.f. is always defined, by construction.]

When we say "moments do not exist", or "moments fail to be defined", what do we actually mean? Well, consider the definition  of the rth moment (about zero) for a random variable, X. This moment is defined as μr' = E[Xr] = ∫xdF(x). If this integral diverges, then the moment is not defined.

What are some examples of the non-existence of moments? Actually, we can illustrate this with a couple of really basic distributions that all econometrics students are familiar with. First, consider the Student-t distribution with v degrees of freedom. The rth moment of this distribution exists only if r > vr = 1, 2, 3, ......  Note that  when v = 1, the Student-t distribution collapses to the Cauchy distribution, none of whose moments exist.

Another extreme situation arises if we have a Normal random variable, Z, and we form Y = (1 / Z). None of the moments of the new random variable, Y, exist. It has an infinite mean, infinite variance, etc. Why might this example be interesting to an econometrician?

Well, suppose that we have a standard linear regression model, with normally distributed errors, and we estimate the coefficients by OLS. Suppose we have reason to focus on the reciprocal of the jth coefficient, and the OLS estimator of this coefficient is bj, say. With normal errors, and from the fact that the marginal distributions of a multivariate normal distribution are also normal, bj is normally distributed. So, it makes no sense to even ask the question, "what is the bias of  (1 / bj)?" If the mean of (1 / bj) isn't defined, then neither is its bias. Likewise, it makes no sense to try and report a standard error for (1 / bj), say by appealing to the delta method. Why? Because a standard error is an estimated standard deviation, and the latter isn't defined if the second moment doesn't exist.

Trying to "estimate" something that doesn't exist is pretty pointless!

The same thing applies if we are interested in estimating the ratio of two of the regression coefficients, and construct the estimator, (bj / bk), say. None of the moments of this estimator are defined either.

Other important examples of this problem arise throughout econometrics. For instance, there are some good examples in the context of estimators for simultaneous equations models (SEMs). The existence of the moments of the 2SLS (and many other IV estimators) depends on the degree of over-identification for the equation in question. The FIML estimator for the structural form of an SEM also has issues with existence of moments, as I described here.

The take-home message here is that you need to keep an eye on the existence, or otherwise, of the moments of any statistic you're working with. I can recall listening to several conference presentations in the late 1970's where the authors were using Monte Carlo methods to simulate the bias and MSE of various SEM estimators, but with experimental designs such that neither of the first two moments of the estimators existed. You wouldn't believe the convoluted explanations that they gave to try and "justify" the rather weird results they obtained!

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.


Pearson, E. S. and M. A. Stephens (1962). The goodness-of-fit tests based on W2 and U2 . Biometrika, 49, 397-402.

Pearson, E. S. and M. A. Stephens (1964). The ratio of range to standard deviation in the same normal sample. Biometrika, 51, 484-487.

© 2011, David E. Giles


  1. Dear Prof Gile:
    You might consider using a font with better contrast to your background.
    It is sort of childish, but once you get past the frat sophomore humor, www.webpagesthatsuck.com has good advice; you can find free browser plugins to check contrast (they don't always work but give clue
    regards,ezra s abrams

  2. Ezra: Thanks for the suggestion - done!

  3. Dear Prof. Giles:

    Maybe you already know it. A good reference which adress this issue is Zellner (1979):

    Zellner A. (1978), "Estimator of functions of population means and regression coefficients including structural coefficients: a minimum expected loss approach", Journal of Econometrics 8, 127-158.