When we're learning the basics of least squares regression analysis, one of the topics that we invariably encounter is the consequences of model mis-specification. In particular, we're taught that omitting relevant regress from the model renders the OLS estimator biased and inconsistent, although its precision is improved. On the other hand, including extraneous regressors simply reduces the efficiency of the OLS estimator of the coefficient vector. That estimator is still unbiased (and consistent) in this case.
These results are just special cases of those associated with imposing false restrictions on the parameter space, or failing to impose valid restrictions. So, once these more general results have been covered there's really no need to treat the "omitted regressors" and "extraneous regressors" situations as a separate matter.
However, usually they are dealt with as a distinct topic. What I find interesting, and what I want to focus on here, is the way in which the unbiasedness of OLS can be demonstrated in the context of irrelevant regressors. There's an easy way to get this result, and there's a more tedious proof. Let's begin by looking at the easy way.
Here's the set-up for our problem. The correct data-generating process (DGP) is:
but the model that is estimated is:
where X1 and X2 are both non-random and of full rank, k1 and k2 respectively.
So, our OLS estimator of the full coefficient vector in (2) is b = (X'X)-1X'y, where X = (X1 , X2).
Given that (1) is the true DGP, we can write
When I'm teaching this stuff, what I do next is to introduce the following zero-one "selection matrix":
S' = ( I , 0')
These results are just special cases of those associated with imposing false restrictions on the parameter space, or failing to impose valid restrictions. So, once these more general results have been covered there's really no need to treat the "omitted regressors" and "extraneous regressors" situations as a separate matter.
However, usually they are dealt with as a distinct topic. What I find interesting, and what I want to focus on here, is the way in which the unbiasedness of OLS can be demonstrated in the context of irrelevant regressors. There's an easy way to get this result, and there's a more tedious proof. Let's begin by looking at the easy way.
Here's the set-up for our problem. The correct data-generating process (DGP) is:
y = X1β1 + ε ; ε ~ [0 , σ2 In] (1)
but the model that is estimated is:
y = X1β1 + X2β2 + u (2)
where X1 and X2 are both non-random and of full rank, k1 and k2 respectively.
So, our OLS estimator of the full coefficient vector in (2) is b = (X'X)-1X'y, where X = (X1 , X2).
Given that (1) is the true DGP, we can write
b = (X'X)-1X'(X1β1 + ε). (3)
When I'm teaching this stuff, what I do next is to introduce the following zero-one "selection matrix":
S' = ( I , 0')
and note that we can write X1 = XS. Immediately, it follows that
b = Sβ1 + ε , and E[b] = Sβ1
Because b' = (b1' , b2'), we have the result that E[b1] = β1 and E[b2] = 0 (= β2).
So, both sub-vectors of b are unbiased estimators for the corresponding coefficient sub-vectors.
Alright, that was easy enough. What's the difficult way to do this?
In equation (3), the (X'X)-1 matrix can be written as a partitioned inverse, and we can proceed, laboriously, as follows:
b1 = [X1'X1 - X1'X2(X2'X2)-1X2'X1]-1X1'y -[X1'X1 - X1'X2(X2'X2)-1X2X1]-1X1'X2(X2'X2)-1X2'y ,
so
E[b1] = QX1'X1β1 - QX1'X2(X2'X2)-1X2'X1β1 ,
where
Q = [X1'X1 - X1'X2(X2'X2)-1X2'X1]-1 .
So,
You can then go through the same agony to prove that E[b2] = 0, if you really want to!
Various types of zero-one matrices can be used in all sorts of ways to make life easier in econometrics, and the use of the selection matrix here is a good example. A comprehensive discussion of the use of these matrices is given by Turkington (2001).
References
Turkington, D. A., 2001. Matrix Calculus and Zero-One Matrices: Statistical and Econometric Applications. Cambridge University Press, Cambridge.
The selection matrix works for showing unbiasedness, but if you want to show that adding irrelevant regressors lowers precision, you have to use Frisch-Waugh...(which uses the partitioned inverse, although you can hide that from your students by solving equations).
ReplyDeleteActually, the proof in the case of omitted variables is a 3-liner, and doesn't require Frisch-Waugh:
Deleteb1=(X1'X1)^1X1'y, so E[b1]=beta1+(X1'X1)^1X1'X2beta2. b1 is biased except when (X1'X2)=0.
The point of the post was that often the use of a zero-one matrix provides a simple approach - not that it ALWAYS simplifies a proof. I urge you to take a look at Darrell Turkington's book that I cite - it has some compelling examples.
Nice post, Prof. Giles! Just to let you know, there is an easy way to display nice maths in blogger using the tex typesetting system, it is very well described on this blog post:
ReplyDeletehttp://holdenweb.blogspot.co.uk/2011/11/blogging-mathematics.html
Thanks - I'll check that out!
Delete