Sunday, April 21, 2019

Recursions for the Moments of Some Discrete Distributions

You could say, "Moments maketh the distribution". While that's not quite true, it's pretty darn close.

The moments of a probability distribution provide key information about the underlying random variable's behaviour, and we use these moments for a multitude of purposes. Before proceeding, let's be sure that we're on the same page here.

Some background

Suppose that we have a random variable, X, whose distribution function is F(x), where x is some value of X. The following quote comes from an old blog post of mine:
"What is sometimes called "the problem of moments" tells us:
If all of the moments of a distribution exist, then knowledge of these moments is equivalent to knowledge of the distribution itself.
In other words, the moments completely define the distribution.
However, note the word, 'if' in the statement of the result above. And it's a very big 'if'! The problem is that for many distributions the moments exist only under certain conditions; and for some distributions some or all of the moments fail to be defined. In these cases, the "theorem" is of limited help. A sufficient condition for a distribution to completely and uniquely determined by its moments is that its moment generating function (m.g.f.) exists."
The rth "raw moment' (or '"moment about zero") for the random variable, X  is defined as μr' = E[r= ∫ xrdF(x) ; r = 1, 2, ..... 

We also use the "central (centered) moments" of X, which are defined as μr = E[(X - μ1)r] ; r = 1, 2, 3, ....... Strictly speaking, we're talking about "integer-order" moments in both cases here. (Remember, r = 1, 2, 3, ....) There are also measures called fractional moments and factorial moments, but we won't be concerned with those.

So, obviously, μ1' = E[X] is just the mean of X; and μ2 = E[(X - E(X))2] is the variance of X. We can always express the "raw moments" in terms of the "centered moments", and vice versa. For example, you'll be aware that Var.[X] = E[2] - [E(X)]2. That is, μ2 = μ2- (μ1')2.

In general, the relationships between these two forms of the moments can be obtained by noting that for some values, a and b,

                           E[(X - br] = Σ{[r! / (i! (r - i)!)] E[(X - a)(a - br-i } ,

where the range of summation is from i = 0 to i = r.

If we set a = 0, and bμ1', we can obtain the centered moments from the raw moments, as follows:

                           μr = Σ{[r! / (i! (r - i)!)] μi' (- μ1r-i } ,                                              (1)

where the range of summation is from i = 0 to i = r.

Similarly, if we set a =  μ1', and b = 0, we can obtain the raw moments from the centered moments:

                           μr' = Σ{[r! / (i! (r - i)!)] μi (μ1r-i } ,                                                (2)

where, again, the range of summation is from i = 0 to i = r.

Often, the first couple of moments of a distribution (i.e., of a random variable) may be all that interest us. For instance, the Mean Squared Error (MSE) of an estimator, θ*, of a parameter, θ, is just it's variance plus the square of its bias. In other words, MSE (θ*) = μ2 + (μ1' - θ)2, where the moments here are those of the sampling distribution of the estimator, θ*.

However, we  use the third and fourth (central) moments when defining skewness and kurtosis. For example, Pearson's definitions of these two measures are Skew (X) = μ3 (μ2)3/2, and Kurtosis (X) = μ4 / (μ2)2. Notice that both of these quantities are unitless. A good example of where we use them is in the construction of tests for Normality. For instance, the null hypothesis for the Jarque-Bera test is that the skewness = 0 and the kurtosis = 3. Only a Normal distribution has this property. So the J-B test, which is a Lagrange multipler ("score") test, is testing the validity of two restrictions on the parameters of the distribution. That's why the test statistic always has a null distribution that's Chi-square, with two degrees of freedom. 

Higher-order moments can be used to introduce additional shape parameters into various distributions. Sometimes the existence of up to the first eight moments is needed in the proofs of various results in statistics and econometrics.

Now look back at the basic definition of the raw moments, namely,  μr' = E[r= ∫ xrdF(x) ;  r = 1, 2, ..... ; and  μr = E[(X  - μ1') = ∫ (x -  μ1') dF(x) ; r = 1, 2, .....   

I'm not sure about you, but I don't think I'd want to be evaluating all of those integrals if there was an easier way to obtain the moments! And you probably know that there is. We can use the moment generating function (if it exists), or the cumulant generating function (which always exists). In each case, we have two options. One involving successively differentiating the m.g.f or the c.g.f.; and the other involves using a Taylor series expansion and equating coefficients.

Even then, for some distributions, obtaining the moments in these ways can be rather tedious - especially if you have reason to obtain the moments up to a fairly high order, 

Is there an even simpler way to proceed? I don't want to sound lazy, but computational efficiency is important.

Recursions, recursions, .......

Econometrics students should be pretty familiar the discussion so far. Now, let's turn to some other relationships between the moments of certain distributions. Usually, these relationships don't come up in your typical econometrics course, but they're very useful - especially from a computational viewpoint.

Specifically, these relationships are in the form of recursion formulae.

I'm sure you know what a recursion is. It's a formula that takes us forward by building on the previous steps. As a really trivial example of this, consider the formula for the factorial of an integer, n. You know that  n! = n (n-1)(n-2)(n-3).....3.2.1. So, we can write n! = n (n-1)! In other words, once we have computed (n-1)! we just multiply by n to move one set forward to n!, and so on.

Because 1! = 1, we have 2! = 2 (1!) = 2; 3! = 3(2!) = 6; 4! = 4(3!) = 24; etc.

If we proceed sequentially in this way, we just have one multiplication to do at each new step, instead of a whole lot of the. I know that isn't a big deal, and nor is there really that much computational cost-saving in this example. However, in more complex problems, recursion formulae can be very powerful tools.

Why is this interesting in the context of the moments of a random variable?

Such recursion formulae aren't available for all of the distributions that you're likely to encounter. (It turns out that most of them require membership of the so-called exponential families.) But there are some interesting ones for discrete distributions that are relevant in econometrics. My follow-up post will deal with similar results for some continuous distributions, where we'll see that there's also an important connection with Stein's Lemma

Binomial distribution

A couple of useful references here are Zhang et al. (2018) and Riordan (1937).

Strictly, this first example, and those that follow, aren't in the form of direct recursions - they're more like indirect recursions, but they each involve a neat "trick".

Suppose that X ~ Bi[(n - 1) , p] and Y ~ Bi[n , p]. Consider a function g(.) such that both |E[g(.)]| and |g(-1)| are finite. Then, 

                 E[g(X)] = E[Y g(Y - 1) / (np)] .

We can use this result to obtain the raw moments of Y, recursively. If we let g(Z) = Z k, for some Z, then:

                 E[k] = E[Y(Y - 1)k / (np)] .
  • If k = 0, then 1 = E[Y] / (np); and so μ1' = E[Y] = np. (Clearly, E[X] = (n - 1)p. )
  • If k = 1, then E[X] = E[Y(Y - 1) / (np)]; and so μ2' = E[2] =  μ1' + n(n - 1)p2 = np[1 + (n - 1)p].
(So, the variance of Y is μ2 = μ2' - (μ1')2 = np(1-p).
  • If k = 2, then E[2] = E[Y(Y - 1)2 / (np)]; and so μ3' = E[3] = 2μ2' + μ1' {(n - 1)p[1 + (n - 2)p - 1}] = n(n - 1)(n - 2)p3 + 3n(n - 1)p2 + np .
  • If k = 3, then we get μ4' = E[4] = (3 + np)E[3] − 3E[2] + E[Y] = np + 7n(n − 1)p2 + 6n(n − 1)(n − 2)p3 + n(n − 1)(n − 2)(n − 3)p4
and so on................... Notice that the rth. moment is expressed in terms of the preceding moments. And of course, if it's the centered moments that we want, then the general formula in (1) , above, allows us to calculate them easily from these raw moments.

As an aside, a different type of recursion formula for these moments is derived by Bényi and Manago (2005).

Poisson distribution

A useful reference here is Hwang (1982).

Let X follow a Poisson distribution with parameter λ. Consider a function g(X) such that both |E[g(X)]| and |g(-1)| are finite. Then,

                  E[λ g(X)] = E[X g(- 1)] .

If we let g(X) = k we have:

                 E[k] = (1 / λ) E[X (X - 1)k]                                                                (3)

and we can use this formula to generate the central moments of the Poisson distribution recursively.
  • If k = 0, then 1 = (1 / λ) E[X], or μ1' = E[X] = λ.
  • If k = 1, then E[X] = (1 / λ) {E[2] - E[X]}, and so μ2' = E[2] = (λ + 1) μ1' = λ(λ + 1).
(Immediately, the variance of X is μ2 = μ2' - (μ1')2 = λ.)
  • If k = 2, then E[2] = (1 / λ) E[X(X - 1)2], and so μ3' = E[3] = (λ + 2) μ2' - μ1' = λ3 + 3λ2 + λ.
  • If k= 3, then E[3] = (1 / λ) E[X(X - 1)3], and so μ4' = E[X 4] = (λ + 3)μ3' - 3μ2' + μ1' ; etc.
So, you see that once we have obtained μ1' to μr', we can easily obtain μr+1' for any r.

The moments of X about its mean, λ, can then be obtained directly from these raw moments, using (1).

Negative Binomial distribution

Again, see Hwang (1982).

First, note that there are several forms of the Negative Binomial distribution. Here, I'm defining it in terms of a random variable, X, that counts the number of failures before the rth. success, where p is the probability of a success. This includes one version of the Geometric Distribution as a special case (when r = 1).

Let X follow a Negative Binomial (NegBin) distribution with parameters r and p. Consider a function g(X) such that both |E[g(X)]| and |g(-1)| are finite. Then,

                  E[(1 - p) g(X)] = E[X g(X - 1) / (r + X - 1)] .

If we let g(X) = (X + r)k, then

                  E[(X + r)k] = (1 - p)-1 E[X(X + r - 1)k-1].

Following the same procedure as for the Poisson distribution, we can again obtain successively higher moments recursively from the lower moments:
  • If k = 1, then (1 - p)E(X + r) = E(X); and so μ1' = r (1 - p)/ p
  • If k = 2, then (1 - p)E[(X + r)2] = E(2) + rE(X) - E(X); and so μ2' = [r(1 - p) / p2][1 + r(1 - p)] = [μ1']2 + μ1' / p
(Right away, we see that Var.[X] = r(1 - p) / p2 = μ1' / p.)
  • If k = 3, then  (1 - p)E[(X + r)]3 = E[3] + 2(r - 1)E[2] + (r - 1)2E[X]; and so μ3' = {[r(1 - 3p) + 2] / p}μ2' + {[r2(2 - 3p) + 2r - 1] / p}μ1' + r3(1 - p) / p ; etc.
Again, once we have obtained μ1' to μr', we can easily obtain μr+1' for any r; and the central moments of X can be obtained from (1).

Obtaining "true" recursions

I noted earlier that one advantage of a recursion formula is that it can simplify and speed computations. From the discussion of the three distributions above, it may not be obvious that we can achieve this from what we've seen so far.

In fact, we can easily "extract" direct recursion formulae for the raw moments in each case. I'll use the Poisson distribution to illustrate this, and you should be able to apply the same approach to the other two distributions.

Let's go back to equation (3), where we saw that

                    E[k] = (1 / λ) E[X (X - 1)k]                                                                (3)

where k = 0, 1, 2, .....

From the binomial theorem,  X(X - 1)k = Σ[k! / (i! (k - i!))] X k-i+1 (-1), where the range of summation is from i = 0 to i = k. (Watch out - this range will change twice in what follows!)

So, from (3),

       E[] = (1 / λ) Σ[k! / (i! (k - i!))](-1)i E[k-i+1]

                  = (1 / λ) Σ[k! / (i! (k - i!))](-1)i E[k-i+1] + (1 / λ)E[k+1],                  (4)

where now the range of summation is from i = 1 to i = k.

Extracting the term involving i = 1 in the summation in (4), and re-arranging, we get:

        E[k+1]  = (λ + k)E[] - Σ[k! / (i! (k - i!))](-1)i E[k-i+1] ,                       (5)

where the range of summation in (5) is for i = 2 to i = k.

Setting k = 0, 1, 2, 3 we get the expressions for μ1', μ2', μ3', and μ4' for the Poisson distribution given earlier.

Of course, what we have in (5) is a nice tidy recursion formula that enables us to calculate the (k + 1)th raw moment as a function of all of the preceding moments.

A numerical example

We can easily write some code that will allow us to compute as many moments of the Poisson distribution as we wish, very efficiently. Subject to numerical "overflow" limitations, of course!

By way of a quick illustration, there's some R code (available on the code page of this blog) that speedily computes the first ten raw moments for the Poisson distribution with λ = 1. Here are the results:
The R code simulates the moments and plots the p.m.f. of the empirical distribution.

The recursion formula comes into its own if,for example, you want to compute the 20th. moment when λ = 3. The answer is 1.4 x1018. Try working that one out by hand!

Obviously, you can amend the R code to choose your own value of λ, and the number of moments that you want to calculate. 

The BIG takeaway from this post? If you want to to compute (lots of ) the moments of some common discrete distributions, you should look beyond the standard approaches. There are some interesting recursion formulae that might help you.

I have a follow-up post on its way that deals with similar results for some familiar continuous distributions. Watch out for it!


Bényi, Á. & S. M. Manago, 2005. A recursive formula for moments of a binomial distribution. College Mathematics Journal, 36, 68-72.

Hwang, J. T., 1982. Improving on standard estimators in discrete exponential families with applications to Poisson and negative binomial cases. Annals of Statistics, 10, 857–867.

Riordan, J., 1937. Moment recurrence relations for binomial, Poisson and hypergeometric frequency distributions. Annals of Mathematical Statistics, 8, 103-111.

Zhang, Y-Y., T-Z. Rong, & M-M. Li, 2018. Expectation identity for the binomial distribution and its application in the calculations of high-order binomial moments. Communications in Statistics - Theory and Methods, available online.

© 2019, David E. Giles

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.