Monday, March 4, 2013

Measuring the Quality of an Estimator

In which, with almost no symbols, I encourage students and practitioners to question what they've been taught............

When it comes to introducing our students to the notion of the "quality" of an estimator, most of us begin by observing that estimators are functions of the random sample data, and hence they are "statistics" in the literal sense. As such, estimators have a probability distribution. We give this distribution a special name - the "sampling distribution" of the estimator in question.

It's understandable that students sometimes find the concept of the sampling distribution a little tricky when they first encounter it. After all, it's based on a "thought game" of sorts. We have to consider the idea of repeatedly drawing samples of a fixed size, for ever, constructing the statistic in question, and then keeping track of all of the possible values that the statistic can take, together with the relative frequency of occurrence for each value. A Monte Carlo experiment is the obvious way to introduce students to this concept.

Of course, if we're approaching the topic of statistical inference (including estimation) from a Bayesian perspective, rather than from a "frequentist" point of view, then the sampling distribution is largely irrelevant. Why? Because Bayesians aren't interested in the performance of inferences in the context of repeated sampling - they develop methods that are "optimal" (in various senses) when we condition on just a single sample of data.

Returning to the frequentist view of estimator quality, the important point is that the sampling distribution provides the basis for the judgements and comparisons that we make when appraising inferential procedures such as estimators and tests. 

For example, the mean of the sampling distribution of the estimator (i.e., the mean of the estimator itself) is used to define the bias of the estimator. The bias is the extent to which this mean differs from the true (unknown) value of the parameter being estimated. In other words, we're asking "how accurate is the estimator, on average, if we were to repeatedly use it?"

If our estimator is unbiased, the variance of the sampling distribution of our estimator can be compared with that of any other unbiased estimator, to measure the relative "efficiency" of the estimators in question. A small variance for the sampling distribution means that the estimator is using the sample information in a relatively efficient, or effective, way. High relative efficiency (or low relative variability) is usually deemed to be a good property for an estimator to have.

By adding together the variance of an estimator (i.e., the variance of its sampling distribution) and the square of the estimator's bias, we get the mean squared error (MSE) of the estimator. This composite measure can then be used to rank estimators in terms of relative efficiency when one or more of the estimators are biased, and it allows for a bias-variance trade-off.  Small relative MSE implies high relative efficiency. 

These ideas hold for any sample size, n. Let's not consider the large-n asymptotic properties of estimators here.

Notice that this traditional frequentist approach to measuring the quality of an estimator rests on just the mean and the variance of the estimator's sampling distribution. Has it ever occurred to you that this reliance on just the first two moments of the sampling distribution may be very misleading if the estimator's sampling distribution is highly skewed, or perhaps even multi-modal?

Such situations certainly arise in practice. For instance, the sampling distribution of the sample variance is skewed to the right. Further, as Phillips (2006) and others have shown, the sampling distributions of instrumental variables estimators, such as the 2SLS estimator, can be bi-modal - especially in context of "weak instruments". In addition, Fiorio et al. (2010) discuss situations where t-statistics can be bi-modally distributed.

So, if you're a frequentist (rather than a Bayesian), wouldn't it be more sensible to measure the quality of an estimator in terms of its full sampling distribution, rather than in terms of just the first two moments of that distribution?

Guess what? Someone already thought of this possibility. In fact, lots of people have investigated the idea of ranking estimators in terms of a performance measure based on the full sampling distribution.

One well-known such measure was first suggested by the Australian statistician, Edwin J. G. Pitman. Suppose that T1 and T2 are two competing estimators of the parameter, θ. Pitman (1937) suggested that we compute the quantity:

                                  PN = Pr.[ |T1 - θ| < |T2 - θ| ]

This quantity is usually called "Pitman's Nearness Measure" (or "Pitman's Measure of Closeness"). If PN > 0.5, we prefer the estimator T1 to T2, and conversely if PN < 0.5.

In 1991, Pitman's original paper was reproduced in a special issue of Communications in Statistics - Series A, devoted to his measure.

In practice, the computation of PN can be rather complicated, as it required knowledge of the joint distribution of T1 and T2. Moreover, the possibility of a "tie" between the estimators being compared raises some subtle technical issues that I'll pass over here. For a good discussion of this point, see Keating et al. (1993, pp. 85-90).

Let's illustrate the use of Pitman's PN measure, and compare it with our traditional MSE measure of estimator quality. This example comes from Khattree (1992), and it involves estimating the variance (σ2) of a normal population with a known mean, mu. Suppose that we consider two estimators of the variance:

                     T1 = (n - 1)-1Σ[xi - μ]2   ;   and  T2 = (n - 1)-1Σ[xi - xbar]2,

where xbar is the sample average.

It can be shown that T1 is a biased estimator, with a bias equal to σ2/(n - 1). T2 is unbiased, of course, and it has smaller MSE than T1. However, T1 dominates T2 in terms of Pitman-nearness, except when n = 2.

The great Indian statistician, C.R. Rao, has been a staunch advocate of PN. For example, see Rao (1981) for some interesting examples where the use of PN seems to be more compelling than the use of MSE.

In addition, PN may be defined, and hence usable for ranking estimators, in cases where MSE isn't even  defined (perhaps because the first two moments of the sampling distribution of one or more of the estimators are not themselves defined. A classic example is where we want to estimate the median of a Cauchy distribution - a distribution that has no integer-order moments. Blyth and Pathak (1985) suggest two estimators for the case where n = 2: T1 = X1; and T2 = (X1 + X2)/2. Both estimators have sampling distributions that are themselves Cauchy, so the MSE is not defined for either T1 or T2. However, it can be shown that PN = 0.3956, so T2 is favoured over T1.

However, it should be noted that the PN measure has also been criticized on various grounds. For example, it can generate intransitivities (Keating et al. (1993, pp. 75-82). Pitman's nearness measure can be generalized in a variety of ways to at least partially address some of these concerns, but I won't go into these details here. A critique of PN is provided, for example, by Christian et al. (1993), and the discussion that follows that particular paper in JASA is well worth reading.

An interesting regression application of Pitman's nearness measure is given by  Keating and Mason (2005). They use this measure to compare the (unadjusted) and adjusted coefficients of determination from a linear OLS regression model. They show, among other things, that if these statistics are viewed as estimators of the population correlation coefficient, then the way in which they are ranked in terms of MSE doesn't match the ranking in terms of PN.

In a recent paper (currently under revision), Jacob Scwartz and I used Pitman's nearness (together with relative bias and relative MSE) to evaluate the performance of various bias-corrected estimators for the zero-inflated Poisson model (Schwartz and Giles, 2013). One of the things that we found was that the rankings of different bias-adjusted estimators, on the basis of relative bias, can be reversed if we use the PN criterion instead. On the other hand, the rankings in terms of PN tended to accord with the rankings in terms of relative MSE, for this particular estimation problem.

So, when it comes to comparing different estimators it's worth keeping in mind that while bias and relative efficiency are important concepts, they are based on only a small part of the overall set of features of the sampling distributions of the estimators in question. You might want to consider using Pitman's nearness measure as a more broadly-based indicator of estimator performance.


Bekker, P. A., 1994. Alternative approximations to the distributions of instrumental variable estimators. Econometrica, 62, 657-681.

Blyth, C. R. & P. K. Pathak, 1985. Does an estimator's distribution suffice?. Proceedings of the Berkeley Confrerence in Honor of J. Neyman and N. Kiefer, 1, 45-52.

Christian, P. R., J. T. G. Hwang, & W. E. Strawderman, 1993. Is Pitman closeness a reasonable criterion?. Journal of the American Statistical Association, 88, 57-63.

Fiorio, C. V., V. A. Hajivassiliou, & P. C. B. Phillips, 2010. Bimodal t-ratios: The impact of thick tails on inference. Econometrics Journal, 13, 271-289.

Keating, J. P., R. L. Mason, & P. K. Sen, 1993. Pitman's Measure of Closeness - A Comparison of Statistical Estimators. SIAM, Philadelphia.

Keating, J. P. and R. L. Mason, 2005. Pitman nearness comparison of the traditional estimator of the coefficient of determination and its adjusted version in linear regression models. Communications in Statistics - Theory and Methods, 34, 367-374.

Khattree, R., 1992. Comparing estimators for population variance using Pitman nearness. American Statistician, 46(3), 214-217.

Phillips, P. C. B., 2006. A remark on bimodality and weak instrumentation in structural equation estimation. Econometric Theory, 22, 947-960.

Pitman, E. J. G, 1937. The closest estimated of statistical parameters. Proceedings of the Cambridge Philosophical Society, 33, 212-222.

Rao, C. R., 1981. Some comments on the minimum mean square error as a criterion of estimation. In  D. A. Dawson et al. (eds.), Statistics and Related Topics. North-Holland, Amsterdam, 123-143.

Schwartz, J. and D. E. Giles, 2013.  Biased-reduced maximum likelihood estimation for the zero-inflated Poisson distribution. Mimeo., Department of Economics, University of Victoria.  (Downloadable version).

© 2013, David E. Giles