## Friday, May 17, 2013

### What's the Variance of a Sample Variance?

This post is really pitched at students who are taking a course or two in introductory economic statistics. It relates to a couple of estimators of the variance of a population that we all meet in such courses - plus another one that you might not have met. In addition, I'll be emphasising the fact that some "standard" results depend crucially on certain assumptions. Not surprisingly - but  not always made clear by instructors and text books.

To begin with, let's consider a standard problem. We have a population that is Normal, with a mean of μ and a variance of σ2. We take a sample of size n, using simple random sampling. Then we form the simple arithmetic mean of the sample values: x* = Σxi , where the range of summation (here and everywhere below) is from 1 to n.

Under my assumptions, we know that the sampling distribution of x* is N[μ , (σ2 / n)]. The normality of the sampling distribution follows from the Normality of the population, and the fact that x* is a linear function of the data. The variance of the sampling distribution stated above is correct only because simple random sampling has been used.

Now, let's get to what I'm really interested in here - estimating σ2. We all learn that the mean squared deviation of the sample, σ*2 = (1 / n)Σ[(xi - x*)2], is a (downward-) biased estimator of σ2. If we allow for the fact that we've actually lost one degree of freedom by estimating μ using x*, then an unbiased estimator of σ2 is s2 = (1 / (n - 1))Σ[(xi - x*)2].

O.K., now what does the sampling distribution of s2 look like?

Well, under the assumptions I've made, including the Normality of the population, s2 has a sampling distribution that is proportional to a Chi-square distribution. More specifically, the statistic, c = [(n - 1)s2 / σ2] is Chi-square with (n - 1) degrees of freedom.

[As an aside, s2 and x* are independently distributed if and only if the population is Normal. The "only if part" of the latter statement is due to the Irish statistician, Geary - see here.]

So, we now know something, indirectly, about the sampling distribution of s2, and we know that E[s2] = σ2. What is the variance of σ2?

Because we're assuming a Normal population, implying that the statistic I've called "c" follows a Chi-square distribution, we can use the result that the variance of a Chi-square random variable equals twice its degrees of freedom.

Re-arranging the formula for "c", we can write: s2 = cσ2 / (n - 1).

Then, Var.(s2) = {[σ2 / (n - 1)]Var.(c)} = {[σ4 / (n - 1)2]2(n - 1)} = 2σ4 / (n -1).

[As another aside, the mean of a Chi-square random variable equals its degrees of freedom, so applying this result to "c" and re-arranging, we immediately get the result that E[s2] = σ2. However, we know this already, and this result holds even if the data are non-Normal.]

Now, this is as far as things usually go in an introductory economic statistics course. To sum up:
• E[s2] = σ2
• c = [(n - 1)s2 / σ2] ~ χ2(n - 1)
• Var.[s2] = 2σ4 / (n -1)
Notice that Var.(s2) vanishes when n grows very large. This, together with the first above, implies that s2 is a (mean-square) consistent estimator of σ2.

Unfortunately, students often don't realize that the second and third of these results rely on both simple random sampling and the Normality of the population.

A thoughtful student will notice that the first result holds even if the data are non-Normal, and will ask, "what's the variance of s2 if the population isn't Normal?" That's a good question!

To answer it, let's introduce an important concept - the "moments" of a probability distribution. Let X be a random variable. Then E[Xk] is called the kth "raw moment" (or, moment about zero) of the distribution of X.  (Here, "k" is a positive integer, but more generally we can allow k to be negative, or a fraction.) Let's denote the kth such moment by μ'k. So, the first raw moment is just the population mean. That is, μ'1 = μ.

Then consider the quantities, μk = E[(X - μ)k], for k = 1, 2, 3,.......... We call these the "centered moments" of the distribution of X. You'll notice that μ2 is just the population variance. The third and fourth centered moments are used (together with μ2) to construct measures of skewness and kurtosis, but that's another story.

By the way, there's an important detail. The expectations involved in the construction of the moments require forming an integral. If that integral diverges, the corresponding moment isn't defined. or instance, the kth moment for a Student's-t distribution with v degrees of freedom exists only if v > k. In the case of the Cauchy distribution (which is just a Student's-t distribution with v = 1), none of the moments exist!

Alright - back to the question in hand! What is the variance of s2 if the population is non-Normal? The answer, in the case of simple random sampling, is:

Var.(s2) = (1 / n)[μ4 - μ22(n - 3) / (n -1)] .

If the population is Normal, then μ4 = 3σ4, and μ22 = σ4. So, we get  Var.(s2) = 2σ4 / (n - 1), in this case.

Notice that this more general expression for Var.(s2) also vanishes as n grows. So, a pair of sufficient conditions for the mean-square consistency of s2 (as an estimator of σ2) is:
1. The data are obtained using simple random sampling;
2. At least the first 4 moments of the population distribution exist.
We can easily work out the expressions for Var.(s2) in the case where the population follows some other distributions that you may have heard about. Here are just a few illustrative results:

Uniform, continuous on [a , b]

μ2 = (b - a)/ 12  ;  μ4 =  (b - a)4 / 80
Var.(s2) = (2n + 3)(b - a)4 / [380n(n - 1)]

Standard Student's-t, with v degrees of freedom

μ2 =   v / (v - 2)  ;  μ4 =   3v2 / [(v - 2)(v - 4)]
Var.(s2) = [2v2(nv - 3  - n)] / [n(n - 1)(v - 2)2(v - 4)]                       ;   for v > 4

χ2, with v degrees of freedom

μ2 = 2v  ;  μ4 = 12v(v + 4)
Var.(s2) = [8v(nv + 6n - 6))] / [n(n - 1)]

Exponential, with mean θ

μ2 = θ2  ;  μ4 = 9θ4
Var.(s2) = [2(4n - 3)θ4] / [n(n - 1)]

Poisson, with parameter λ

μ2 = λ  ;  μ4 = λ(3λ + 1)
Var.(s2) = 2λ2 / (n - 1) + (λ / n)

Keep in mind that in each of the cases, the sampling distribution of  c = [(n - 1)s2 / σ2] will no longer be a χ2 distribution! Given our assumption of simple random sampling, you should be able to convince yourself that the asymptotic sampling distribution of "c" will be Normal.

References

Cho, E. & M. J. Cho, 2008. Variance of sample variance. Proceedings of the 2008 Joint Statistical Meetings, Section on Survey Research Methods, American Statistical Association, Washington DC,1291-1293.

Geary, R. C. (1936). The distribution of the Student's ratio for the non-normal samples. Supplement to the Journal of the Royal Statistical Society, 3, 178-184.

1. Hi there! Excellent post!
In case you might be intrested I collected some detailed derivations of the variance of sample variance and its distribution in my blog at href="http://www.statlect.com/variance_estimation.htm

1. Thanks - and the link to your page is much appreciated!
DG

2. Very helpful thank you. I think you may want to double check the result for the exponential distribution... I think a simple arithmetic was made when substituting int he 2nd and 4th moments.

1. Thanks - now fixed.

3. Very useful page, thank you. I am just a bit confused about the example using the uniform distribution. Isn't 9/5 the value of kurtosis in the uniform distribution?
Thank you.
Claudio

1. Yes, when the support is [0 , 1]. I've corrected the expression for the fourth central moment. Thanks!

4. Thanks for the article. Quick correction:

The 4th central moment of the chi-squared distribution is: 12*v*(v+4)

http://mathworld.wolfram.com/Chi-SquaredDistribution.html

After I made that correction I was well on my way with the rest of the info you provided. Best regards!

1. Thanks a lot! Fixed - along with a couple of other typos.

5. Thank you for the arcticle, it helped a lot with my thesis.

The only that confused me a little is the variance of sample variance for the Poisson distribution, shouldn’t it be λ/n+2λ^2/(n-1)?

Thank you :)

1. Thank you for pointing out this error. You are correct. The second central moment is Lambda, and the fourth central moment is (Lambda(1 + 3 Lambda)), giving the result that you stated. I have changed the text accordingly.

6. This article helped me a lot. Thank you! Should we use Finite Population Correction Factor (FPC) when you sample without replacement?