Tuesday, April 16, 2013

Being Unbiased Isn't Everything!

When we first learn about estimation, we encounter various properties that estimators might possess. Unless your first course in statistics or econometrics takes a fully Bayesian stance, then these properties will be ones based on the sampling distribution of the statistic that is being used as the estimator.

There are plenty of unsettling things that can be raised against the notion of the sampling distribution, but let's put those to one side here. In elementary courses, attention usually focuses on just the mean and variance of an estimator's sampling distribution. I'm not endorsing this - it's just a fact of life.

For example, we learn that an estimator of a parameter is "unbiased" if the mean of the estimator's sampling distribution is located at the true (but unknown) value of the parameter of interest. That is, if E[θ*] = θ, where θ* is the estimator of θ.

Similarly, we learn that if we comparing two unbiased estimators of a scalar parameter, θ, say θ* and θ+, then θ* is said to be relatively "more efficient" than θ+ if Var.[θ*] < Var.[θ+].

The next thing that we usually encounter is an extension of this notion of relative efficiency to the case where one or both of the competing estimators are biased. Students are taught about the "Mean Squared Error" (MSE) of an estimator, defined as MSE[θ*] = E[(θ* - θ)2], and they are shown that MSE[θ*] = Var.[θ*] + (Bias[θ*])2. Then, θ* is relatively more efficient than θ+ if MSE[θ*] < MSE[θ+].

A familiar example that compares the properties of unbiasedness and relative efficiency is as follows. Suppose that we have a simple random sample, of size n, drawn from a population that is N[μ , σ2], and we want to estimate σ2. Let xbar be the sample average. Then the estimator, s2 = (Σ[xi - xbar]2) / (n-1) is unbiased for σ2. However, the biased estimator, σ*2 = (Σ[xi - xbar]2) / n has smaller MSE than s2, and so it's relatively more efficient than the latter estimator.

 Moreover, note that if we consider the family of estimators of σ2, of the form (Σ[xi - xbar]2) / (n + c), where "c" is a constant, the member of this family with the smallest MSE arises when c = 1. However, how often have you used this particular estimator of σ2?

Of course, we usually stick with s2, for very good reasons, and not just because it is unbiased. (For example, recall how s2 is used in the development of the Student-t statistic that we use for treating hypotheses about the other parameter in the problem, μ.)

So, when it comes to estimators, unbiasedness certainly isn't everything.

Here's an even more compelling example to illustrate this point. I came across it in a short paper by Michael Hardy (2003), but he attributes it to Romano & Siegel (1986). Here's the example.

Suppose that the random variable X follows a Poisson distribution, with parameter &lamda;. We want to estimate the quantity, (Pr.[X = 0])2 = e-2λ, based on a single sample observation, x. It turns out that the only unbiased estimator of this quantity (for any positive λ), is (-1)x.

However, this unbiased estimator is clearly an absurd choice. For instance, if we observe any  x that is odd, then the estimate is just -1. This would be a ridiculous choice, as we know that the quantity being estimated, e-2λ, must lie in the interval (0 ,1]. The maximum likelihood estimator, namely e-2x, is biased but it will return a "plausible" estimate.

We shouldn't get too "hung up" on insisting on unbiasedness when we're evaluating the merits of a potential estimator.


Hardy, M, 2003. An illuminating counterexample. American Mathematical Monthly, March. 

Romano, J. P. and A. F. Siegel, 1986. Counterexamples in Probability and Statistics. Wadsworth & Brooks/Cole, Monterey, CA.

© 2013, David E. Giles


  1. David,

    I really like this. I feel that bias is something which modern statistics can readily deal with, and unbiasedness isn't much of a basis for choosing an estimator.

  2. Thank you,
    The emphasis on bias in the classroom, from the perspective of someone who is a practitioner with a sort of tradesman's training in statistics is confusing. It always feels trivial, certainly compared to the effect of the size of n, but also to the overall imprecision in the estimate.
    For those with limited time to get a grasp of how to make use of statistics it doesn't feel like a proper ranking of emphasis in terms of teaching what is important.
    It's a tricky issue,