Wednesday, December 28, 2011

When is the OLS estimator BLU?

Or, if you prefer, "When do the OLS and GLS estimators coincide?"

O.K., so you think you know the answer to this one? My guess is that you know a sufficient condition, but probably not a necessary and sufficient condition, for the OLS and GLS estimators of the coefficient vector in a linear regression model to coincide. Let's see if I'm right!

Let's start off with the model we'll be considering:

                                   y = + ε       ;     ε ~ [0 , V] ,

where X is non-stochastic and of full column rank, and V is at least positive semi-definite. Note that we don't need the errors to be normally distributed. Also, V may be singular for much of what follows to hold.

Recall that the OLS estimator of β is bO = (X 'X)-1X 'y, while the GLS (Aitken) estimator is bG = (X 'V -1X)-1X 'V -1y , if V is positive-definite.

If the X matrix is non-random and V is positive-definite, then the GLS estimator is BLU, by the Gauss-Markov Theorem. We all know that a sufficient condition for the OLS and GLS estimators to coincide, and for bO to be BLU, is that V = σ2I. Or, if you like, V is a scalar matrix. But, is that all that we can say about the connection between these two estimators? Presumably not, or there'd be no point to this post!

There are various (mathematically equivalent) ways of stating a necessary and sufficient condition for the OLS estimator to be BLU in our situation, and some of the relevant references are Rao (1967), Zyskind (1967), Kruskal (1968), and Watson (1972).There's also a very accessible discussion by Puntanen and Styan (1989).

Here are two of the more easily digestible ways of phrasing the condition:

For the model defined above, where V is at least positive semi-definite, the OLS estimator of β is BLU if and only if there exists a matrix, Q, such that VX = XQ.

Equivalently, the OLS estimator is BLU if and only if V takes the form V = X Γ X ' + Z Θ Z ' + λ I; where (X 'Z) = 0, Γ and  Θ are arbitrary matrices, and  λ is an arbitrary scalar.

When these necessary and sufficient conditions are satisfied, and V is non-singular, the OLS and GLS estimators are identical.

If we want to focus on just a sub-set of the coefficients in the regression model, and ask when will OLS be BLU? In this case a necessary and sufficient condition is provided by Krämer et al. (1996).

[As an aside, note that if V = σ2I, then setting V = Q satisfies the first condition above, trivially. What choices for  Γ , Z, and λ will satisfy the second form of the condition?]

As far as the statistics literature is concerned, this is old news. Interestingly, this important statistical result rarely gets a mention in econometrics courses, or econometrics text books. With regard to the latter, one important exception is Amemiya (1985).

This might make you think that although the Rao-Zyskind results are of relevance to statisticians, they don't have much application when it comes to econometrics. Well, you'd be wrong! There are actually quite a lot of results in econometrics that can be established very easily using this theorem. In several cases, this provides the simplest and most transparent way of obtaining them. McAleer (1992) provides an excellent discussion of all of this.

Here are just two econometric examples:

1. A Random coefficient model
Consider the following k-regressor random coefficient regression model  -

          y = + v   ; v ~ [0 , σv2I ]

          δ = β + ε    ;  ε ~ [0 , Γ ]  ,

where  Γ is (k x k) and ε and v are independent.

On the face of it, this doesn't look like a situation where the OLS estimator will be efficient. However, note the following. We can write the model as:

          y = + (v + X ε) = + u   u ~ [0 , V ] ,

where V = (X Γ X ' + σv2I ) , given the independence of ε and v.

Notice that VX = (X Γ X 'X + σv2X) = X(Γ X 'X +σv2I) = XQ , say.

So, the OLS estimator of β in this model is in fact BLU, by the first condition in the above Theorem. (Remember that Q can be chosen quite arbitrarily.)

Also, notice that the independence of ε and v could be relaxed to uncorrelatedness if these two random terms were each normally distributed.

2. The SUR model
This is one that I especially like, given my earlier work on this model (e.g., Srivastava and Giles, 1987)!

Here, we have a system of m equations, each of the form

          yi = Xi βi + εi    ;     εi  ~ [0 , σii I ]

where there are n observations on all of the variables, and ki elements in βi ; i = 1, ...., m.

You'll recall that the m equations of the SUR model can be "stacked up", and the model can be written more compactly as:

          y = Xβ + ε   ;    ε ~ [0 , V]

where y and ε are (mn x 1); β is (K x 1), where K = k1 + .... + km; V = (ΣI ); where '●' denotes the Kronecker product operator; and Σ = [σij ], i, j = 1, ...., m.

In introducing this model in 1962, one of Arnold Zellner's insights was that the system of equations could be written as a single "big" equation, with a non-scalar covariance matrix, V, suggesting the application of a GLS-type estimator.

Now let's consider the Rao-Zyskind result. The OLS estimator of β will be BLU, if and only if VX = XQ, where Q = [Qij], and  Qij is a (ki x kj) matrix; i, j = 1, ...., m.

This condition can be re-stated as σij Xj = Xi Qij ; for i, j = 1, ...., m.

(i) Now, you'll  notice that if the errors in the different equations are all uncorrelated with each other, so that σij = 0 for i j, then the Rao-Zyskind condition is obviously satisfied. Just set Qij  = 0 . We can do this - the choice is quite arbitrary. So, the OLS estimator is efficient in this case, as we all know.

(ii) If every equation in the system has identical regressors, then Xj = Xi , and the Rao-Zyskind condition is again met - just set Qij = σij I  to see this, bearing in mind that ki = kin this case. OLS is efficient in this situation too, as is also well-known.

Neither of these two results are especially difficult to prove by other methods, especially if you are comfortable using the Kronecker product operator for matrices. However, here's another result relating to the SUR estimator that's tedious to prove by conventional approaches, but which emerges almost trivially if you use the Rao-Zyskind result.

(iii) If the regressors in the various equations are not all the same, and if σij  0, for i j, we can use the Rao-Zyskind result to determine the conditions under which OLS will be BLU. Remember that this condition can be stated as  σij Xj = Xi Qij; for i, j = 1, ...., m.

Now, pre-multiply each side of this relationship by the projection matrix, Mi = I - Xi (Xi' Xi)-1Xi', and recall that Mi Xi = 0. So, we have σij Mi Xj = 0 ; for i, j = 1, ...., m. Of course, we can interchange the 'i' and 'j' subscripts and get the result σji MjXi = 0 ; for i, j = 1, ...., m.

In other words:

OLS will be BLU in the SUR model if the range space of Xi equals the range space of Xj, in all equations for which σij 0, for i j

This result drops out really simply here - especially when you compare this proof with the really tedious one (based on Dwivedi and Srivastava, 1978) that Viren and I gave in our book!

As I mentioned above, there are many other interesting econometrics results that can be established very easily by using the Rao-Zyskind result.

So, the take-home messages for this post are pretty simple -
  • Don't believe everything you read in your favourite econometrics text book.
  • Don't assume that this text book is statistically thorough.
  • Become more familiar with the statistics literature - it will probably make your econometric life a lot easier!

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.


Amemiya, T., 1985. Advanced Econometrics, Harvard University Press, Cambridge MA.

Dwivedi, T. D. and V. K. Srivastava, 1978. Optimality of least squares in seemingly unrelated regression equation model. Journal of Econometrics, 7, 391-395.

Krämer, W., R. Bartels and D. G. Fiebig , 1996. A final twist on the equality of OLS and GLS. Statistical Papers, 37, 277-281.

Kruskal, W. , 1968. When are Gauss-Markov and least squares estimators identical? A coordinate-free approach. Annals of Mathematical Statistics, 39, 70-75.

McAleer, M., 1992. Efficient estimation: The Rao-Zyskind condition, Kruskal's theorem and ordinary least squares. Economic Record, 68, 65-72.

Puntanen, S. and G. P. H. Styan, 1989. The equality of the ordinary least squares estimator and the best linear unbiased estimator. American Statistician, 43, 153-161.

Rao, C. R., 1967. Least squares theory using an estimated covariance matrix and its application to measurement of signals. In L. M. LeCam and J. Neyman (eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, 355-372.

Srivastava, V. K. and D. E. A. Giles, 1987. Seemingly Unrelated Regression Equations Models: Estimation and Inference. Marcel Dekker, New York.

Watson, G. S., 1972. Prediction and the efficiency of least squares. Biometrika, 5, 91-98.

Zyskind, G., 1967. On canonical forms, non-negative covariance matrices and best simple linear least squares estimators in linear models. Annals of Mathematical Statistics, 38, 1092-1109.

© 2011, David E. Giles


  1. Dr. Giles, so we do not need to worry much about
    normality of residuals in regression right? but only heteroskedasticity? Is this true in general or there are exceptions?


  2. Anonymous: I'm not sure how you inferred that from this post. The Rao-Zyskind condition in the theorem doesn't require normality of the errors. That's all I said.

  3. In order to conduct hypothesis tests, one only needs to ensure that the OLS assumptions about the error terms are satisfied before proceeding. Is this true?

    1. Which assumptions do you have in mind?