Thursday, October 18, 2012

Let's be Consistent

One of the standard, large-sample, properties that we hope our estimators will possess is "consistency". Indeed, most of us take the position that if an estimator isn't consistent, then we should probably throw it away and look for one that is!

When you're talking about the consistency of an estimator, it's a really good idea to be quite clear regarding the precise type of consistency you have in mind - especially if you're talking to a statistician! For example, there's "weak consistency", "strong consistency", "mean square consistency", and "Fisher consistency", at least some of which you'll undoubtedly encounter from time to time as an econometrician.

When we first meet the concept of a “consistent” estimator, as students, we usually learn about what is actually called “mean square consistency”. This notion is usually described in the following manner, where for simplicity θ* is an estimator of a scalar-valued parameter, θ, based on a sample of size n:

If           (i)    Bias(θ*) → 0  as → ∞ ;

and        (ii)   var.(θ*) → 0  as n →∞ ,

then  θ*  is a “mean square consistent” estimator of  θ.

Note that the two conditions above imply that the mean squared error of the estimator converges to the value zero as n grows without limit – hence the terminology. If the parameter and its estimator are vectors, then we simply replace the “variance” with the “covariance matrix” in condition (ii).

Basically, what is happening under mean square consistency is that the density associated with the estimator's sampling distribution is collapsing to a degenerate "spike", located exactly at the true value of  θ, when the sample size grows without limit.

This type of consistency is actually rather a strong property. Specifically, it requires that E(θ*)  and E(θ*2 ) are both defined, for all values of n. These expectations are only defined if the underlying integrals converge. (Recall that for any continuous random variable, X, E(X) = ∫xp(x)dx.) These integrals actually diverge for many distributions. For example, in the case of the Student-t distribution with v degrees of freedom, these two expectations are defined only if v > 2. 

So, mean square consistency is a nice property, but it may not be possible to even talk about its existence in some cases.

However, a weaker form of consistency, based on the notion of the “probability limit”, or the “plim” can be used, even when we can’t consider mean square consistency. We say that our estimator "converges in probability" to the true parameter value, or is “weakly consistent”, if plim(θ*) = θ. That is, if lim(n→ ∞)Pr.[|θ* - θ| < ε] = 1, where ε is some arbitrarily small positive number.

You may have been told that if an estimator is mean square consistent, then it must also be weakly consistent, but that the converse is not necessarily true. Let’s see what is involved in establishing this result. Essentially, we just exploit a well-known theorem from probability theory, known as Chebyshev’s Inequality

(Sometime’s you’ll see this Russian mathematician’s name spelled differently. That’s because there is often more than one acceptable transliteration from the Cyrillic alphabet to ours.)

Anyway, here’s the inequality in question:

Let X be a random variable and let g(.) be a non-negative real-valued function. 
Then, Pr.[g(x) ≥ k] ≤ E[g(x)] / k, for all k > 0.

Now, consider the probability that we use when defining weak consistency, namely  Pr.[|θ* - θ| < ε].

Note that,

            Pr.[|θ* - θ| < ε] = Pr.[(θ* - θ)2 < ε2 ] = 1 - Pr.[(θ* - θ)2 ≥ ε2] .          (1)
            Pr.[(θ* -θ)2  ≥  ε2] ≤ E(θ* -θ)2 / ε2 ,
by Chebyshev’s inequality. 

(Here, g(θ*)  is (θ* - θ)2, which is clearly non-negative, as required.)

Now, if the estimator is mean square consistent, this means that E(θ* - θ)2 → 0 as n → ∞.

So, if the estimator is mean square consistent, then from (1),

                lim(n→∞)Pr.[|θ* - θ| < ε] = (1 - 0) = 1. 

That is, the estimator is also weakly consistent.

So, we have our result: mean square consistency implies weak consistency. 

On the other hand, weak consistency does not imply mean square consistency. One counter-example will suffice to show this. 

If we have a simple random sample of n observations, from a population with a finite mean and variance, then the sample average, x* = (Σixi) / n is both a mean square consistent, and a weakly consistent estimator of the population mean, μ. Consider using (1 / x*) as an estimator of (1 / μ). 

This estimator is weakly consistent, by Slutsky's Theorem. However, if the population is Normal, then neither E[1 / x*] nor E[(1 / x*)2] exist. This implies that this estimator cannot be mean square consistent.

Finally, here's a simple regression example to illustrate some of the above points. 

Suppose we have a simple regression model, where the only regressor is a time-trend variable, That is x = 1, 2, 3, ...., n. So, the model is:

         yt = β1 + β2xt + εt   ;   εt ~ [0, σ2]

and the errors are serially independent. Given the zero mean for the errors, and the non-random regressor, the OLS estimators (b1 and b2) of βand β2 are unbiased. You should be able to show that

              var.(b1) = 2σ[(n + 1)(2n + 1)] / [n(n - 1)(n + 1)]
              var.(b2) = 12σ2 / [n(n - 1)(n + 1)].

Clearly, both of these variances go to zero as n grows without limit, and both estimators are unbiased. Hence, they are both mean square consistent estimators. It follows that they are both also weakly consistent estimators.

© 2012, David E. Giles


  1. Thanks for your grateful informations, this blogs will be really help for students blogs.

  2. Great post, Dave. The textbooks usually (always?) start with plims and then build up to stronger forms of consistency, but you've convinced me that starting with convergence in mean square is the right way to teach it. I have to lecture on this in a couple of weeks, so I will give it a try straight away (and will cite you, of course!).


    1. Mark - thanks for kind comment. I'll be interested to har how it goes for you.

  3. Hi Dave. The lecture went pretty well, actually. I think your basic idea was absolutely right - instead of doing what most (all?) of the textbooks do and start with the weakest notion of consistency and then work up through stronger versions, start instead with a strong notion that's easier for students to get their heads around and then work down through weaker notions.

    At least, I *think* the lecture went pretty well. I've directed my students to your blog and to this entry in particular, so maybe some of them will want to comment on how this method worked from their perspective.


    1. Mark - thanks for the further feedback! I'll be interested to see if any of your students comment.


    2. Dear Dave, dear Mark,

      you are right, the approach to explaining consistency from the strong version to the weaker one worked quite nicely in Mark's lecture, at least for me. Since I haven't done much econometrics before, the lecture as well as the this blog post helped me to understand the notion of consistency quite well (at least I hope so :)). So thank you both!


    3. Martin - that's great! Thanks for the feedback.


    4. I am one of Mark's students as well and can only agree with Martin that this exposition made it easier. I was motivated to understand the more abstract definitions of consistency after having heard the more intuitive ones. Thanks!


    5. Chistian - thanks for the hlpful feedback!


  4. Many thanks for this exposition. Though I'm studying it lately, I believe it will help me in subsequent work and research. I'm also a student of Mark Schaffer.

  5. Thanks for the post!


Note: Only a member of this blog may post a comment.