## Sunday, November 25, 2012

### Is the Cochrane-Orcutt Estimator Unique?

One of the work-horses of econometric modelling is the Cochrane-Orcutt (1949) estimator, or some variant of it such as the Beach-MacKinnon (1978) full ML estimator. The C-O estimator was proposed by Cochrane and Orcutt as a modification to OLS estimation when the errors are autocorrelated. Those authors had in mind errors that follow an AR(1) process, but it is easily adapted for any AR process.

I've blogged elsewhere about the the historical setting for the work by Cochrane and Orcutt.

Given the limited computing power available at the time, the C-O estimator was a pragmatic solution to the problem of obtaining the GLS estimator of the regression coefficients, and approximating the full ML estimator. Students of econometrics will be familiar with the iterative process associated with the C-O estimator, as outlined below.

The use of this estimator leads to some interesting questions. Is this iterative scheme guaranteed to converge in a finite number of iterations? Is there a unique solution to this convergence problem, or can multiple local solutions (minima) occur?

Although the answers to these questions are quite well documented, unfortunately they're not very often addressed in econometrics text books. That's a pity, because the answers are simply stated, and they provide some useful insights for other such iterative estimators.

Let's recall the iterative version of the Cochrane and Orcutt estimator. At time "t", let's write our regression model as:

yt = xt' β + εt    ;    t = 1, 2, 3, ......, T          (1)

εt = ρεt-1 + ut  ;   |ρ| < 1

E[ut] = 0 ; var.[ut] = σu2 ; E[ut , us] = 0  (t ≠ s).

Subtracting ρ times the one-period lagged version of equation (1) from equation (1) itself, we get:

(yt - ρyt-1) = (xt' - ρxt-1') β + ut ; t = 2, 3, ...., T      (2)

The procedure proposed by Cochrane and Orcutt was essentially as follows:

(i) Estimate (1) by OLS and obtain the residuals, et ( t = 1, 2, ...., T).

(ii) Obtain a  consistent estimate of ρ by regressing et on et-1 (with no intercept), using OLS. Call the estimate "r".

(iii) Replace ρ with r in equation (2), and estimate that model by OLS. This yields a new estimate of β.

(iv) Using this estimate of β, re-compute the residuals from (1), and re-estimate ρ, as in step (ii).

(v) Use this new value of r in place of ρ in equation (2), and re-estimate that model by OLS.

(vi) Go to (iv), and iterate to convergence.

[Of course, we could also write equation (2) in the form:

yt = ρ yt-1 + xt' β - xt-1' ρβ + ut (3).

The model in equation (3) is non-linear in the parameters, and we can apply NLLS to achieve the C-O estimator.]

A couple of other things to note. If you look carefully at equation (2), you'll see that we have "lost" the first sample observation because of the "lagging" of the data. Later authors (e.g., Kadiyala, 1968) recognized that if we apply the Cochrane-Orcutt transformation to observations 2 to T in the sample, but transformed the first observation on all of the variables (in equation (2)) by (1 - ρ2)1/2, then we would obtain the feasible GLS estimator for the β parameters.

It remained for Beach and MacKinnon (1978) to show that this GLS estimator is not the full ML estimator (under normal errors), because it ignores the fact that the Jacobian of the transformation from the error term to the dependent variable is not unity. An extra term has to enter the log-likelihood function to take account of this, and this alters the estimates of all of the parameters in the model.

Now, as far as the convergence properties of the iterative C-O estimator are concerned, let's start at the beginning.

One of the most influential papers in the historical development of econometrics was Denis Sargan's so-called "Colston paper" (Sargan, 1964). It derived this name from the fact that it was origianlly presented at the Colston Society conference on National Economic Planning, held at the University of Bristol in 1963. Among other things, this paper developed many of the ideas relating to generalized Instrumental Variables estimation; it laid out the "British approach" to econometric methodology; and it foreshadowed the (much later) literature on error-correction models.

In Appendix A to that paper, Sargan provided a proof of the convergence (in a finite number of steps) of a broad class of iterative schemes. That class includes the iterative C-O estimator. What Sargan showed was that such schemes must converge to a local minimum of the objective function. Multiple minima are possible - global convergence is not guaranteed.

Appendix A to Sargan's paper can be found here (see p. 153). The last passage in the Appendix has been suppressed by the publisher, but it reads as follows:

"..... that lies within a distance δβ of the limit point. If this point is (ai , bi) take k = (bi - bi*) / β. The corresponding point on the line defined above could be taken as the next point in the iteration, and unless k = 0, fi+1 > f*. This is a contradiction, showing that the sequence does not converge to the saddlepoint. The only case where the sequence does converge to the saddlepoint is where for some finite i, bi = b*. In this case the next point in the sequence is the saddlepoint, and so are all subsequent points. This case, however, occurs with probability zero."
So, as useful as the C-O estimator is, it suffers from a very serious weakness. There is no guarantee that it is unique. Although this shortcoming also arises with many ML estimation problems (when the first-order conditions are non-linear functions of the parameters), it's not particularly good news. It's also something that practitioners rarely acknowledge.

The non-uniqueness of the C-O estimator was highlighted very clearly by Dufour et al. (1980). Their paper was a response to a paper by Betancourt and Kelejian (1981), in which the latter authors showed that this problem could arise when the C-O estimator is applied to a regression model containing a lagged value of the dependent variable as a regressor. (Given the publication delays, the paper by Dufour et al. appeared in print before that by Betancourt and Kelejian.)

The bottom line: there is a non-uniqueness problem for the C-O estimator, whatever regressors appear in the model.

As a piece of practical advice, if this appears to be a concern in a particular application, you can always resort to a grid-search approach. In the case of AR(1) errors this is especially simple. The autocorrelation parameter, ρ, must satisfy the stationarity condition, |ρ| < 1. So, it is quite easy to condition on successive  values of ρ on a fine grid over this interval; minimize the conditional sum of squared residuals; and then select the value of ρ (and the associated regression coefficient estimates) for which this sum of squares is globally minimized.

A final comment is in order. Modifying our estimators to "compensate" for autocorrelation in the error term of a regression model isn't necessarily the best thing to do. Often, this autocorrelation is symptomatic of a model mis-specification, and what we really need to be considering is a modification of the model.

But that's another story.

References

Beach, C. M., and J. G. MacKinnon, 1978. A maximum likelihood procedure for regression with autocorrelated errors. Econometrica, 46, 51–58.

Betancourt, R. and H. Kelejian, 1981. Lagged endogenous variables and the Cochrane-Orcutt procedure. Econometrica, 49, 1073-1078.

Cochrane, D. and G. H. Orcutt, 1949. Application of least squares regression to relationships containing auto-correlated error terms. Journal f the American Statistical Association, 44, 32-61.

Dufour, J-M., M. J. I. Gaudry, and T. C. Liem, 1980. The Cochrane-Orcutt procedure numerical examples of multiple admissible minima. Economics Letters, 6, 43-48.
Kadiyala, K. R., 1968. A transformation used to circumvent the problem of autoregression. Econometrica, 36, 93-96.
Sargan, J. D., 1964. Wages and prices in the United Kingdom: A study in econometric methodology. In. P. E. Hart, G. Mills, and J. K. Whitaker (eds.), Econometric Analysis for National Economic Planning, Vol. 16 of Colston Papers. Butterworths, London, 25—63.

1. Dave: Did you lose a rho in equation (2)?

1. Thanks - yes I did! Now fixed!
DG

2. In your description you state that "(iv) Take the residuals from this last regression, and re-estimate ρ, as in step (ii)." Why is that? I always thought that one should use the new estimate \hat{\beta} to get the residuals from equation (1), and then compute the new estimate for ρ by OLS. Sorry if I am mistaken here! Nice post, as always!

1. Thanks - a brain-freeze on my part! Now fixed.

DG

3. So if autocorreltaion is detected in regression residuals, the Cochrane Orcutt procedure should be used to correct it ?

1. Really only as a "last resort". You need to look very carefully at the functional form, and especially the "dynamics" of the model specification. Problems in these areas are often the "cause" of the autocorrelation in the residuals.