Econometrics Beat: Dave Giles' Blog: Proxy Variables and Biased Estimation

Sunday, December 15, 2013

Proxy Variables and Biased Estimation

Here's a problem from the exam. that one of my econometrics classes sat recently. It's to do with some of the consequences of mis-specifying a regression model, and then applying OLS estimation.

Specifically, let's suppose that data-generating process (the correct model specification) is actually of the form:

y = Xβ + ε ; ε ~ [0 , σ²I_n] . (1)

However, we can't observe the k variables in the X matrix, and instead we replace them with k "proxy variables" (substitutes) that we can observe. So, the model that we actually estimate is:

y = X^*β + v . (2)

The students were asked to show that the usual (unbiased) estimator of σ² is actually biased in this case; and they were asked if they could determine the "direction" of the bias.

If v* is the residual vector after we estimate (2) by OLS, then the estimator of σ² that we'd construct would be

σ*² = v*'v* / (n - k) = y'M*y / (n - k),
where
M* = I_n - X*(X*'X*)^-1X*' .

Now, the correct expression for y is given by (1), so

σ*² = (Xβ + ε)'M*(Xβ + ε) / (n - k)

= [β'X'M*Xβ + ε'M*ε + 2ε'M*Xβ] / (n - k) ,
and
E[σ*²] = β'X'M*Xβ + E[ε'M*ε] . (3)

Note that each term in (3) is scalar, so

E[ε'M*ε] = E{tr.[ε'M*ε]} = E{tr.[M*ε ε']} = tr.{E[M*ε ε']}

= tr.{M*σ²I_n} = σ² tr.(M*) = σ² (n - k).

So,
E[σ*²] = β'X'M*Xβ / (n - k) + σ²,

and our estimator of σ² is biased, with a bias of β'X'M*Xβ.

Finally, note that as M* is idempotent it is (at least) positive semi-definite, so β'X'M*Xβ ≥ 0. That is, our estimator has a non-negative bias.

The exercise can be taken a step further by asking "under what condition(s), if any, will this bias be zero?"

Putting to one side the uninteresting situation where in Xβ = 0, we're left with the following condition - the estimator will be unbiased if M*X = 0 (or, equivalently, if X'M* = 0). Let's interpret this condition. Given that the problem has been set up so that models (1) and (2) each have the same number (k) of regressors, M*X = 0 only if X = X*. In this case, the correct variables have been used for estimation purposes.

So, replacing all of the regressors with proxy variables implies that the usual unbiased estimator of σ² will definitely be biased upwards.

You might check out the following variation on the problem. What if there are k* > k proxy variables in model (2)? What if there are k* < k proxy variables? Do you get such an unambiguous result in these cases?

2 comments:

AnonymousDecember 15, 2013 at 1:23 PM
Missed some points on that one, haha. Thanks for a good term Dr. Giles!
ReplyDelete
Replies
AnonymousSeptember 7, 2016 at 4:02 AM
Strictly speaking, bias is 0 as long as X*=X*A, where A is (presumably) full rank kxk matrix.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Sunday, December 15, 2013

Proxy Variables and Biased Estimation

2 comments: