## Sunday, May 22, 2016

### A Quick Illustration of Pre-Testing Bias

The statistical and econometric literature on the properties of "preliminary-test" (or "pre-test") estimation strategies is large and well established. These strategies arise when we proceed in a sequential manner when drawing inferences about parameters.

A simple example would be where we fit a regression model; test if a regressor is significant or not; and then either retain the model, or else remove the (insignificant) regressor and re-estimate the (simplified) model.

The theoretical literature associated with pre-testing is pretty complex. However, some of the basic messages arising from that literature can be illustrated quite simply. Let's look at the effect of "pre-testing" on the bias of the OLS regression estimator.

Consider the following bivariate regression model:

yi = β1x1i + β2x2i + ui    ,

where x1 and x2 are non-random; and ui ~ i.i.d. N[0 , σ2] for all i.

Suppose that we adopt the following strategy:

1.  Test to see if x2 is a statistically significant regressor.
2.  If our test suggests that it is significant, then retain both x1 and x2 in the model.
3. If our test suggests that it is not, then drop x2 from the regression, and re-estimate the model keeping just xas the sole regressor.

Let's focus on the estimator of β1 that's actually associated with this strategy. If we stop at step 2, denote the (unrestricted) OLS estimator (MLE) as b1U. If we stop at step 3, denote the (restricted) OLS estimator (MLE) as b1R.

The null hypothesis that we'll be testing is H0: β2 = 0, and the alternative hypothesis is HA: β2 > 0 (although nothing of substance changes if we adopt a two-sided HA).

Given our assumptions about the error term in the model, the obvious (and UMP) test will be a t-test. Denote the associated test statistic by "t2", and let tc(α) be the associated critical value when the chosen significance level is α.

So, the"pre-test" estimator of β1 is of the following form:

β1* = b1U   ;  if  t2 > tc(α)

β1* = b1R   ;  if  t2 ≤ tc(α)

One final bit of notation will be particularly helpful in exposing the result that I want to illustrate.

Let I[A](x) be an "indicator function" that takes the value "1" if the random variable, x, lies in the interval A, and is zero otherwise. Let A' be the "complement" to the interval A. So, if x does not lie in A, it lies in A', and vice versa.

Note that I[A'](x) = 1 - I[A](x); and I[A](x)I[A'](x) = 0 , for any A and x.

Also, I[A](x) is a binary random variable, as it is a function of x.

Letting x = t2, we can write our pre-test estimator of β1 as:

β1* = I(tc(α) , ∞)(t2) b1U + I[0 , tc(α)](t2b1R

= {1 - I[0 , tc(α)](t2)} b1U + I[0 , tc(α)](t2) b1R

= b1U + I[0 , tc(α)](t2) (b1R - b1U) .

Let's think about the biases of b1U and b1R, under the assumptions associated with our model.
• b1U is always unbiased, whether H0 is true or false. (Over-fitting the model will not introduce a bias in the OLS estimator.)
• b1R is unbiased if H0 is true, but it is biased if H0 is false. (Omitting a relevant regressor biases the OLS estimator.)
With this in mind, consider the expected value of β1*:

E[β1*] = E[b1U] + E{I[0 ,tc(α)](t2(b1R - b1U)}

= β1 + E{I[0 , tc(α)](t2(b1R - b1U)}.

The second term on the right of the last equation is typically non-zero, and it's going to be messy. That's because t2 (and hence the indicator function) is not independent of (b1R - b1U).

This second term is the bias in the (pre-test) estimator, β1*.

So, even without going to all of the trouble to evaluate the bias exactly - and it actually is quite a lot of trouble, even for this very simple model - we can see that pre-testing in the context of OLS estimation will typically introduce a bias.

Is there any circumstance in which this bias will be zero?

A sufficient condition for this will be met if b1R = b1U, and this in turn will be satisfied if the OLS estimator of β2 in the original two-regressor model happens to yield a point estimate of exactly zero. Putting this extreme case to one side, the pre-test estimator of β1 will be biased.

The basic result demonstrated here extends fully to the case where we have a multiple regression model, and instead of testing a single "zero" restriction we test the validity of a set of independent linear restrictions on the coefficient vector, using an F-test.

Further, although pre-testing generally biases or regression estimator, it also has an impact on the latter's efficiency. In other words, it impacts on the estimator's MSE. This suggests the following question: "Is pre-testing necessarily a 'bad' strategy?"

I'll take this up in more detail in a subsequent post, but in the meantime you might check out my earlier related posts (here and here), and the survey material on pre-testing in Giles and Giles (1993).

Reference

Giles, J. A. and D. E. A. Giles, 1993, “Pre-test estimation and testing in econometrics: Recent developments”, Journal of Economic Surveys, 7, 145-197.

1. From a superficial look at the links, I gather you have found that pre-testing can be justified by certain loss functions (e.g., mean squared error) under certain circumstances.

If I understand correctly, this analysis treats the likelihoods as probability distributions over the parameters, which makes sense only if they are are seen as, in particular, the posterior distributions proceeding from an uninformative prior.

Question: is there any case for pre-testing now that doing full Bayesian computation is easy (as of course it was not when you wrote the survey paper)? It seems to me if you can specify a loss function, you might as well report your whole posterior together with expected loss as a function of the parameters. Then the loss-minimizing point-estimate is evident, together with much else; the purpose is clear; and alternative loss functions can be considered without muddling up the statistical starting point.

1. Michael - thanks for the comments. First, regarding Bayesian inference, I totally agree - my PhD in '75 was in Bayesian econometrics (and see various posts on this blog).

The point about the pre-testing literature (which goes back to Ted Bancroft's work in the 1940's) is the following:

1. It is dealing only with non Bayesian inference, where the sampling distribution is a core concept. Hence the emphasis is on issues such as Bias, MSE, etc.

2. It does not condone pre-testing (which only very rarely is an optimal strategy), and is always inadmissible.

3. Rather, the emphasis is on pointing out the (usually adverse) consequences of pre-testing.

4. (Non-Bayesian) applied researchers invariably pre-test, often in complex ways. Then they interpret their results as if this pre-testing had not taken place. That's where the problem lies. Their final estimators and tests don't have the properties that they ascribe to them.

The literature on pre-test issues continues to grow, which I believe is a good thing. A lot of this literature can be quite technical, so some simple illustrations can be helpful. Simulation exercises usually convince students about the associated issues pretty quickly too.