Wednesday, September 28, 2011

Estimating Models With "Under-Sized" Samples

"....and there is no new thing under the sun."
Ecclesiastes 1:9 (King James Bible)

The first part of my career was spent at New Zealand's central bank (the Reserve Bank of N.Z.), where I was heavily involved in the construction and use of large-scale macroeconometric models. By the mid 1970's our models involved more than 100 equations (of which about 50 were structural relationships that had to be estimated; and the rest were accounting identities). The basic investigative estimation was undertaken using OLS; and variants such as the Almon estimator for distributed lag models. Boy, this dates me!

Of course, we were well aware that OLS wasn't appropriate for the finished product. These models were simultaneous equations models, so OLS was inconsistent. Obviously, something more appropriate was need, but which estimator should we use?

Ideally, it would have been nice to have used a full systems estimator, such as 3SLS or FIML. I've posted previously about these estimators, and some of the pros and cons of their use, relative to single-equation (limited information) estimators, such as 2SLS or other Instrumental Variables (IV) estimators. I won't go over old ground again here.

So, why were full system estimators not a practical option? Well, we were using quarterly time-series data to estimate the models. The number of observations available varied from equation to equation, depending on the variables in question. The  most data we had was 48 observations, and if we took the longest "common" sample (available for all of the equations) it was, of course, somewhat less.

Once the identities were substituted out of the model, we would have been faced with the problem of estimating 50 structural equations using a sample of less than 50 observations. This can't be done, using a full system estimator such as 3SLS or FIML (Klein, 1971; Brown, 1981). This used to be referred to as a situation where the sample is "under-sized" (relative to the size of the model).

So, this meant that we had to turn to limited information estimators instead, in order to get consistent (but less asymptotically efficient) estimates of the model's parameters. IV estimation was the obvious choice, but which instruments should one use? Before answering that question, we have to keep in mind that this model was being estimated and used by a central bank, and we were answerable to the Board of Governors, and on occasion to the Minister of Finance (who also happened to be the rather volatile Prime Minister). In other words, this wasn't a pure research exercise; and I must say that in those days econometric models were viewed with more than just a little skepticism by some of the economic decision-makers that we had to deal with.

This being the case, one thing that we learned pretty quickly was that it was not a good idea to ever give the impression that any choices we made could be (mis-) interpreted as being "arbitrary"! Imagine trying to explain to one of the Governors that there was an infinite number of ways that we could have chosen the instruments for our IV estimation, and then trying to justify our particular choice, equation by equation. That was a real possibility!

In one sense, OLS was quite appealing - it was easy to explain; and it produced unique estimates! However, we knew that this wasn't good enough.

The obvious solution was to use 2SLS. Why? Not only is it consistent, but it's also asymptotically efficient among all IV estimators for which the instruments are linear combinations of the predetermined variables. It also has the same asymptotic efficiency as the LIML estimator, so we could argue that it was the "optimal" estimator to us, given the constraints we faced.

However, life wasn't that simple! Recall that we had at most 48 observations. The model had 99 current exogenous variable. The number of "predetermined" variables was much greater than this, once lagged exogenous and lagged endogenous regressors were taken into account. So what? Well, recall what's involved when you apply the 2SLS estimator. Focus on a particular equation. You take each right-hand-side endogenous regressor, and at the first stage you regress each regressor against all of the predetermined variables in the entire model, using OLS, and you get the predicted series. Then, at the second stage,............. Whoops! We can't even get to the second stage. The first stage is infeasible. We need to fit OLS regressions involving n = 48 and k > 99. Again, it can't be done!

This is another example of an "under-sized" sample. The sample is too small, relative to the size of the model, and relative to the estimator that we're wanting to use.

Now we're really in trouble. On the one hand we want to do a good job. At the very least, we want to get consistent estimates of the coefficients in each equation of the model. We'd also like to retain as much asymptotic efficiency as possible. On the other hand, we don't want to introduce unwanted arbitrariness into the choice of estimator. Is there a solution?

Fortunately, yes! The trick is to try and condense the information in all of those predetermined variables into a (much) smaller set of new variables. A number small enough to allow us to perform the first stage of 2SLS. We're not going to be able to retain all of the information, but we can do pretty well if we use the first few Principal Components of the matirx of observations on all of the predetermined variables. I used a similar trick in two earlier posts (here and here) in connection with the OECD's "Better Life Index".

This idea had been kicking around in the context of 2SLS since Kloek and Mennes (1960), and it was taken up in the context of large macroeconometric models by a host of other authors, including Amemiya (1966), Mitchell (1971), Preston (1972), McCarthy (1972),  and Klein (1973). We began using the Two Stage Principal Components (2SPC) estimator for the New Zealand model in 1974 (see Giles and Morgan, 1975), and the Bank of Finland (Hirvonen, 1975) was also using it around that time.

In our case, we found that the first 9 principal components explained over 99% of the variation in the predetermined variables in the model. The 2SPC estimator was definitely feasible! As the principal components are linear combinations of the data, with unique weights determined by the eigenvalues of the data matrix, we got close to solving the asymptotic efficiency issue without introducing unwanted arbitrariness. It was a "win-win" situation.

(An alternative "solution" - namely, using the generalized inverse of the X'X matrix, was proposed independently by Fisher and Wadycki (1971), and Swamy and Holmes(1971). However, this was subsequently shown by Joosten (1980) to be just a special case of the 2SPC estimator. Another approach was suggested by Fiebig et al. (1983).)

Well, so much for the history lesson. Imagine my surprise, then, when I saw the recent papers by Ng and Bai (2009), and Winkelfried and Smith (2011). These authors pick up the idea of using principal components to construct a reduced set instruments for IV estimation. Winkelfried and Smith refer to the old undersized-sample literature, but their objective, and that of Ng and Bai, is somewhat different - bias reduction, without sacrificing asymptotic efficiency.

Sometimes those old ideas are real "keepers"!

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.

References

Amemiya, T. (1966). On the use of principal components of independent variables in two-stage least-squares estimation. International Economic Review, 7, 283–303.

Brown, B. W. (1981). Sample size requirements in full information maximum likelihood estimation. International Economic Review, 22, 443-459.

Fiebig, D. G., S. A. Kidwai and H. Theil (1983). Simultaneous equation estimation from undersized samples. Statistics and Probability Letters, 1, 229-232.

Fisher, W. D. and W. J. Wadycki (1971). Estimating a structural equation in a large system. Econometrica, 39, 461-465.

Giles, D. E. A. and G. H. T. Morgan (1977). Alternative estimates of a large New Zealand econometric model. New Zealand Economic Papers, 11, 52-67.

Joosten, G. (1980). Some remarks on the undersixzed sample problem in econometrics. Economics Letters,6, 137-143.

Klein, L. R. (1971). Forecasting and policy evaluation using large scale econometric models: The state of the art. In, M. D. Intrilligator (ed.), Frontiers of Quantitative Economics, North-Holland, Amsterdam.

Klein, L. R. (1973). The treatment of undersized samples in econometrics. In A. A. Powell and R. A. Williams (eds.), Econometric Studies of Macro and Monetary Relations, North-Holland, Amsterdam.

Kloek, T. and L. B. M. Mennes (1960). Simultaneous equations estimation based on principal components of predetermined variables. Econometrica, 28, 45–61.

McCarthy, M. D. (1972). The Wharton Quarterly Forecasting Model, Mark III .Studies in Quantitative Economics No. 6, Economic Research Unit, University of Pennsylvania, Philadelphia.

Mitchell, B. M. (1971). Estimation of large econometric models by principal components and instrumental variables methods. Review of Economics and Statistics, 53, 140-146.

Ng, S. and J. Bai (2009). Selecting instrumental variables in a data rich environment. Journal of Time Series Econometrics, 1 (1), article 4.

Preston, R. S. (1972). The Wharton Annual and Industry Forecasting Model. Studies in Quantitative Economics No. 7, Economic Research Unit, University of Pennsylvania, Philadelphia.

Swamy, P. A. V. B. and J. Holmes (1971). The use of undersized samples in the estimation of  simultaneous equation systems. Econometrica, 39, 455-459.

Winkelfried, D. and R. J. Smith (2011). Principal components instrumental variables estimation. CWPE 1119, University of Cambridge.

Hirvonen, J. (1975). On the use of two stage least squares with principal components. Mimeo., Bank of Finland, Helsinki.