Monday, October 7, 2013

A Regression "Estimator" that Minimizes MSE

Let's talk about estimating the coefficients in a linear multiple regression model. We know from the Gauss-Markhov Theorem that, within the class of linear and unbiased estimators, the OLS estimator is most efficient. Because it is unbiased, it therefore has the smallest possible Mean Squared Error (MSE), within the linear and unbiased class of estimators.

However, there are many linear estimators which, although biased, have a smaller MSE than the OLS estimator. You might then think of asking: “Why don’t I try and find the linear estimator that has the smallest possible MSE?”

This certainly sounds like a sensible, well-motivated question. Unfortunately, however, attempting to pursue this line of reasoning yields an “estimator” that can’t actually be used in practice. It's non-operational.

Let's see why. I'll demonstrate this by using the simple linear regression model without an intercept, although the result generalizes to the usual multiple linear regression model. So, our model, with a non-random regressor, is:

                   yi = βxi + εi    ;     εi ~ i.i.d. [0 , σ2]  ;  i = 1, 2, ...., n.

Let β* be any linear estimator of β, so that we can write β* = Σaiyi, where the ai's are non-random weights, and all summations are taken over i = 1 to n.

So, E[β*] = βΣ(aixi) , and

                Bias[β*] = β[Σ(aixi) - 1].                                                       (1)


               var.[β*] = Σ[ai2var.(yi)] = σ2Σ(ai2).                                       (2)

From, (1) and (2),

              M = MSE[β*] = var.[β*] + (Bias[β*])= σ2Σ(ai2) + β2[Σ(aixi) - 1]2 .

Now, let's find the weights in the construction of β* that will minimize that estimator's MSE. Let Mj' be the partial derivative of M with respect to a typical aj (for j = 1, 2, ...., n). Then:

             Mj' = 2σ2aj + 2β2[Σ(aixi) - 1]xj    ;   j = 1, 2, ...., n.                 (3)

Setting all of the equations in (3) to zero, multiplying by yj, and summing over j, we get:

            σ2β* + β2[Σ(aixi) - 1]Σ(xjyj) = 0 .                                              (4)

Similarly, setting all of the equations in (3) equal to zero, multiplying by xj, and summing over all j, we get:

             Σ(ajxj) = Σ(aixi) = [β2Σ(xi2)] / [σ2 + β2Σ(xi2)] .                         (5)

Substituting (5) into (4), and re-arranging the result, we finally get:

             β* = { [β2Σ(xi2)] / [σ2 + β2Σ(xi2)] }b ,                                      (6)

where b = [Σ(xiyi)] / [Σ(xi2] is the OLS estimator of β.

So, the minimum MSE linear "estimator" of β is non-operational. It can't be applied, because it is a function of β and σ2, both of which are unknown. Yes, we could make the estimator operational by replacing the unknown parameters with their OLS estimators - but the resulting modified β* would then be nonlinear and, more importantly, it would no longer have any optimal MSE property.

Nice idea, but it didn't work! There's no viable linear estimator of the regression coefficient vector that minimizes MSE.
Finally, we can see from equation (6) that because σ2 > 0, β* is also a shrinkage "estimator" - it shrinks the value of b towards the origin.

© 2013, David E. Giles


  1. But that could work if I have prior information about Beta, right?

  2. Dave; dumb question: isn't an "estimator", by definition, something that can be calculated from the sample data *only*?

    If not, then here's my proposed estimator: beta itself! It has a MSE of precisely zero!

    1. Yes - usually, though the term "non-operational" is also used. Another way to phrase the point of my post - there's no linear estimator of beta that minimizes MSE.