Let's imagine that you're teaching an econometrics class that features hypothesis testing. It may be an elementary introduction to the topic itself; or it may be a more detailed discussion of a particular testing problem. We're not talking here about a course on Bayesian econometrics, so in all likelihood you'll be following the "classical" Neyman-Pearson paradigm.
You set up the null and alternative hypotheses. You introduce the idea of a test statistic, and hopefully, you explain why we try to find one that's "pivotal". You talk about Type I and Type II errors; and the trade-off between the probabilities of these errors occurring.
You might talk about the idea of assigning a significance level for the test in advance of implementing it; or you might talk about p-values. In either case, you have to emphasize to the classt that in order to apply the test itself, you have to know the sampling distribution of your test statistic for the situation where the null hypothesis is true.
Why is this?
If you're going to assign a significance level, you need this information because that's how you determine the "critical value" that partitions the rejection region from the non-rejection region. Just recall that the significance level is the (conditional) probability of rejecting the null hypothesis, H0, when in fact it is true.
If you're going to compute a p-value, you also need to know the distribution of your test statistic when H0 is true. Remember that the p-value is the probability of observing a value for the test statistic that's as "extreme" (or more extreme) than the value you've computed using your sample data, if H0 is true.
Depending on the testing problem under discussion, you probably end up telling the class (or, better yet, proving formally) that the null distribution of the test statistic is Chi-square, Student-t, F, etc.
That's great, but do you ever discuss what distribution the test statistic follows if H0 is false?
Does your favourite econometrics text book provide the details relating to this situation?
Has a student ever asked you: "What distribution does the Wald test, LM test (, Chow test, Hausman test, ....., fill in the gap) statistic follow when H0 is false?"
What students need to be told is that when they're working with the Chi-square, F, or Student-t distributions, these are all just special cases of their more general counterparts - the non-central Chi-square, non-central F, and non-central Student-t distributions. They're the special cases that arise if the non-centrality parameter is set to zero.
(There's also a non-central Beta distribution, and a doubly-non-central F distribution, but I'm not going to worry about those here.)
To illustrate what's going on here, consider the following well-known theorem:
"If x is a random n-vector that is N[0 , V], and A is a non-random (n x n) matrix, then the quadratic form, x'Ax, is Chi-square distributed with v = rank(A) degrees of freedom, if and only if AV is idempotent."
In fact, this is a special case of following more general result:Straightforward proofs of both of these results can be found in Searle (1971, p.57), for example.
"If x is a random n-vector that is N[μ , V], and A is a non-random (n x n) matrix, then the quadratic form, x'Ax, is non-central Chi-square distributed with v = rank(A) degrees of freedom, and non-centrality parameter λ = ½ (μ'Aμ), if and only if AV is idempotent.
(Warning! The literature is split on what convention should be followed in defining λ. Many authors define it without the "2" in the denominator. This can be very confusing, so be aware of this.)
The density function for a non-central Chi-square random variable is quite interesting. It's an infinite weighted sum of central Chi-square densities, with Poisson weights. That is, it's of the form:
f(x; v, λ) = e-λ Σ[λk x(v/2 + k - 1) exp-(x/2)] / [2(v/2 + k) Γ(v/2 + k) k!] ; x ≥ 0 ,
where the range of summation is from k = 0 to k = ∞.
To illustrate things, the following plot shows what the density function for a non-central χ2 distribution looks like for various values of λ, when the degrees of freedom are v = 3. (The R code that I used to create this plot is available on the code page for this blog.)
A non-central F distribution arises when we have two independent random variables. The first is non-central Chi-square, with v1 degrees of freedom, and a non-centrality parameter, λ. The second is central Chi-square, with v2 degrees of freedom. The random variable,
F = [χ2(v1,λ) / v1] / [χ2(v2) / v2] ,
is non-central F, with v1 and v2 degrees of freedom, and non-centrality parameter, λ.
Finally, a non-central Student-t arises when we have a first random variable, X1, that is N[μ , 1], and second (independent) random variable, X2, that is is Chi-square with v degrees of freedom, Then the random variable,
t = X1 / [X2 / v]½ ,
follows non-central Student-t distribution with v degrees of freedom, and a non-centrality parameter of λ = (μ'μ)/2.
Let's see why such non-central distributions are important in the context of hypothesis testing. Suppose, for example, that we're conducting a test for which the test statistic follows a (central) χ2 distribution with v = 3 when the null hypothesis (H0) is true, and a non-central χ2 distribution when H0 is false. For a 5% significance level, the critical value is 7.815, and this is shown with the green marker in the above plot. In that plot we see that as λ increases, the tail area to the right of the critical value increases (monotonically). This area is the probability of rejecting H0. When λ = 0 this area is 5%. It's just the chosen significance level - the probability of rejecting H0 when it is true.
However, for positive values of λ this area is the probability of rejecting H0 when it is false, to some degree or other. So, the areas under the red and blue curves, to the right of "crit", give us points on the power curve associated with test.
In a follow-up post I'll be discussing the role of the non-centrality parameter in determining the power of a test in more detail The usual F-test for linear restrictions on a regression coefficient vector will be used as an example. In addition, that post will provide some computational details.
Searle, S. R., 1971. Linear Models, Wiley, New York.