## Monday, May 9, 2011

### A Trick of the Trade

Simultaneous Equations Models (SEMs), together with the treatment of measurement error, lie at the historical foundation of Econometrics as a discipline.

The idea of SEM’s for the economy came from Jan Tinbergen, who estimated a 24-equation system for the Dutch economy in 1936. See Tinbergen (1959, pp.37-84) for an English translation. When the first Nobel Prize in Economic Science was awarded in 1969, Tinbergen shared the inaugural honour with Ragnar Frisch (a Norwegian econometrician) for their pioneering work that led to the development of econometrics as a recognized sub-discipline.

It's not that long ago that courses in econometrics included a solid amount of material relating to SEMs. Consequently, students became well aware of issues surrounding parametric identification. They were also very familiar with a raft of estimators designed to deliver consistent, and perhaps asymptotically efficient,  estimates in the context of simultaneous systems. These included single equation estimators - generally Instrumental Variables (IV) estimators such as 2SLS and the k-class family, including the Limited Information Maximum Likelihood (LIML) estimator; and full system estimators such as 3SLS and Full Information Maximum Likelihood (FIML).

These days, the estimation of SEMs seems to get relatively little attention in standard econometrics courses. Most students learn about 2SLS, and many of them appreciate that this is just a specific member of the IV family of estimators. However, few students realize that all of the other estimators - yes, including FIML - are also in the IV family. For example, see Hendry (1976).

That's a shame, because it's very helpful to know that there are some major unifying themes that underly the econometrics that we learn. The subject is not just an ad hoc collection of pseudo-related results. As I've noted in an earlier posting (Cookbook Econometrics), I have some difficulties with econometrics courses that focus on what to do, rather than why to do it.

They say that hindsight is '20-20', and the estimation of SEMs is a great case in point. The optimality of FIML estimation was appreciated very early in the development of our discipline. Unfortunately, there was the small matter of computational barriers that precluded its widespread adoption. See Renfro (2004) and some of the other papers in that issue of JESM for an interesting discussion of the history of these computational issues.
So, in response, econometricians cunningly developed second-best estimators that had the minimal property of weak consistence, and in some cases were asymptotically efficient within certain contexts, but were also practical to implement. Given the computational resources of the day, they could actually be applied by you and me. And, in part, this is why a generation or two of standard econometrics textbooks dealt with SEMs by presenting the material in a particular order: identification, 2SLS, k-class, LIML, 3SLS and finally FIML. The estimators were discussed in terms of computational demands.

We all knew that FIML was 'optimal', but most students didn't have access to the computing power needed to apply it. When I was teaching a year-long grad. course on SEMs in the second half of the 1970's, I emphasised two unifying themes - maximum likelihood estimation, and IV estimation. The latter emphasis, in particular, was pretty progressive at the time.

Well, so much for the mini-lesson in econometric history!

I want to share a little SEM-related trick with you. Several of my students over the years have found it to be quite helpful. I believe it first came to light in a discussion paper authored by Les Godfrey and Mike Wickens in 1977, and published as Godfrey and Wickens (1982). It's also included as Exercise 6.18 in the second volume of Phillips and Wickens (1978).

Here's the situation. We want to estimate, as efficiently as possible, one or more structural equations which are clearly simultaneous in nature - that is, at least some of the equations have regressors which are endogenous, and need to be 'explained' by some other equations. However, as we add additional equations to the system, we introduce even more endogenous regressors, which in turn have to be 'explained' by further and further structural equations. We don't want to go on specifying a bigger and bigger model, with more and more equations, for several reasons, including:
• We probably won't know the appropriate structural specification for these extra equations that we are introducing.
• Even if we did, we may not have data for all of the variables that would be needed to estimate them.
• If the model gets too big, then we're not going to be able to use a 'system' estimator, such as FIML, unless we have a very large sample size.
• As the model gets bigger, the use of FIML or some other 'system' estimator becomes increasingly risky. We are trading off the hope for more asymptotic efficiency against the possibility of mis-specifying one or more of the equations, rendering FIML inconsistent, and hence useless.
Yes, I know we could simply use IV estimation for the few equations that we were interested in originally, but note that I said that we want to use an estimator that is (asymptotically) as efficient as possible. So, if a full system estimator such as FIML or 3SLS is out, the most efficient single-equation estimator is LIML. Many econometrics packages (including EViews) incorporate the LIML estimator for single structural equations.

That's fine, but there may be a better way to proceed. What we can do is to set up the original strucural equations that we are interested as a system - or, more correctly, a sub-system. By the latter, I mean that we have the basic structural equations, and then we add extra equations, one for each for the endogenous regressors in the model that we have not yet 'explained'. However, we don't try to specify structural relationships for these remaining endogenous variables. Instead, we write down a reduced form equations for each of them. That is, we 'explain' each of them just in terms of all of the predetermined variables in our model. (Predetermined variables comprise all exogenous variables, plus any lagged exogenous or lagged endogenous variables that have been used so far in the model.)

So, suppose we have 2 structural equations that we're primarily interested in:

y1t = a0 + a1x1t + a2y1t-1 + a3x2t + a4y2t + a5y3t + ut

y2t = b0 + b1x3t + b2y3t + b3y1t-1 + b4x4t + vt

There are 3 endogenous variables: y1, y2 and y3.
There are 6 predetermined variables: constant, x1, x2, x3, x4 and the lagged value of y1.
The random error terms are denoted u and v.

However, there are fewer equations than endogenous variables. What I'm suggesting is that we take this 2-equation 'sub-system' and augment it with a third equation, of the form:

y3t = c0 + c1x1t + c2x2t + c3x3t + c4x4t + c5y1t-1 + wt ,

where w is again a random error term.

You'll recognize that this last equation is just an (unrestricted) reduced form equation. We now have a 3-equation system for our 3 endogenous variables.

Now, here's the important result:
If we estimate this augmented system by FIML, the estimates that we get for the coefficients in the structural sub-system (the a's and the b's) are exactly the LIML estimates.

There's an important advantage of using the sub-system FIML approach, rather than simply estimating each of the original, well-specified, structural equations individually by LIML. Once the (augmented) system is estimated you immediately have a complete multi-equation model, with estimates for the coefficients and the error covariance matrix, and you can use this model to forecast all of the endogenous variables and to simulate policy 'shocks'. This won't be easy to do if you've just estimated some of the equations by LIML (or any other single-equation estimator).

Let's look at an example of all of this, using EViews. As usual, the data are available on the Data page that goes with this blog, and there is an EViews workfile on the Code page.

The example involves a single structural equation:

y1t = a0 + a1x1t + a2y1t-1 + a3x2t + a4y2t + ut

Here are the results of LIML estimation. Note that LIML is an IV estimator, so there must be at least as many instruments as there are regressors - 5, including the intercept, in this case. So, I've declared a second lag of y2t as the fifth instrument.

Any valid instrument will do, as long as I also include it in the reduced form equation that's used to complete the system. In our case, that reduced form equation is:

y2t = c0 + c1x1t + c2x2tc3y1t-1 + c4y1t-2 +  wt ,

and the FIML results are:

Coefficients c(1) to c(5) correspond to the coefficients a0 to a4 in the structural equation of interest. You can see that the LIML and FIML coefficients match. The standard errors differ because they are calculated in quite different ways under LIML and FIML estimation.

In the attached EViews file I've included the estimated Model, corresponding to this estimated System. You can use it to generate static and dynamic, and deterministic and stochastic simulations for both y1 and y2.

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.

References

Epple, D. and B. T. McCallum, (2005). Simultaneous equation econometrics: The missing example. Mimeo., Department of Economics, Carnegie-Mellon University.

Godfrey, L G. and M. R. Wickens (1982). A simple derivation of the limited information maximum likelihood estimator. Economics Letters, 10, 277-283.

Hendry, D. F. (1976). The structure of simultaneous equations estimators. Journal of Econometrics, 4, 51-88.

Phillips, P. C. B. and M. R. Wickens (1978). Exercises in Econometrics. Ballinger, Cambridge MA.

Renfro, C. G. (2004). Econometric software: The first fifty years in perspective. Journal of Economic and Social Measurement, 29, 9-107.

Tinbergen, J. (1959), Selected Papers. Eds. L. H. Klaassen, L. M. Koyck and J. H. Witteveen, North-Holland, Amsterdam.