Friday, July 11, 2014

Finite-Sample Properties of the 2SLS Estimator

During a recent conversation with Bob Reed (U. Canterbury) I recalled an interesting experience that I had at the American Statistical Association Meeting in Houston, in 1980. I was sitting in a session listening to an author presenting a paper about the bias and MSE of certain simultaneous equations estimators. The results were based on a Monte Carlo experiment. However, something just didn't seem right.

I looked at the guy sitting next to me - I didn't know him, but he was also looking puzzled. Then, at the same time, we both said to each other, "But the first two moments of that estimator don't exist!" The next thing out of our mouths was, "Who's going to tell him?"

The guy next to me turned out to be Tom Fomby, and I believe he was the one who politely explained to the speaker that his results were nonsensical.

If (the sampling distribution of) an estimator doesn't have a well-defined mean then it's nonsensical to talk that estimator's bias. Equally, if it doesn't have a well-defined variance, then it makes no sense to talk about its MSE. In other words, the Monte Carlo simulation results were trying to measure something that didn't exist! 

So, what was going on here?

Well, the existence, or otherwise, of the (finite-sample) moments of simultaneous equations estimators generally depend on the degree to which the equation in question is over-identified. Let's focus here on the so-called "limited information" or "single-equation" estimators for the structural form of the model. Examples of such estimators include 2SLS, LIML, etc. The k-Class family of estimators includes these two estimators as special cases. For example, the 2SLS estimator is the k-CLass when estimator with k = 1.

Any k-Class estimator for which plim(k) = 1 is weakly consistent, so LIML and 2SLS are consistent estimators. OLS corresponds to k = 0, and so it is an inconsistent estimator in this context.

Of course, consistency is a large-sample, asymptotic property, and a very weak one at that. 

What can we say about the finite-sample properties of theses estimators - properties such as bias and MSE?

There are several important papers that address this question. Perhaps the most general of these is Kinal (1980). In that paper, Terry concisely summarises the previous literature, and then proves a key result. I'll present that result here just for the case of the 2SLS estimator.

Suppose we're wanting to estimate the following structural equation, based on a sample of size "n":

                  y = Xβ + Yγ + u

where y and u are (n x 1); X is (n x K1); β is (K1 x 1); Y is (n x G- 1); and γ is (G- 1 x 1).

The y (Y) variables are endogenous and the X variables are predetermined. In addition, K2 predetermined variables in the simultaneous system are excluded from this equation.

So, the order condition for the identification of the parameters in this equation is K2 ≥ G1 - 1, or K2 - G1 + 1 ≥ 0.

The degree of over-identification for the equation is D = K2 - G1 + 1.

Terry's result tells us that the mth. moment of the 2SLS estimator exists iff m < D + 1.

If the equation is just-identified, then D = 0, and so no moments exist!

If D = 1, then the mean of the 2SLS estimator exists, but its variance doesn't.

We require that D ≥ 2 for the mean and variance of the 2SLS estimator to be defined in finite samples.

So, if we have a just-identified equation, talking about the (finite-sample) MSE of the 2SLS estimator makes no sense at all!

Now, recall that the "standard errors" that we report with estimated regression coefficients are estimates of the true standard deviations of those coefficient estimates. If the variance of an estimator isn't defined, neither is its standard deviation. In such cases it makes no sense to report a standard error.

What does it mean to estimate something that isn't defined?

So, what about the 2SLS standard errors that are reported by your favourite econometrics package, even when the equation being estimated is just-identified, or if D = 1? What you have to remember is that those are actually "asymptotic standard errors". They are estimates of the standard deviations of the asymptotic distributions of the 2SLS coefficient estimates. The asymptotic distribution of the 2SLS estimator is well defined, regardless of the degree of identification of the equation. Because this asymptotic distribution is normal, all of its moments are defined - so there's no problem if n is infinitely large.

However, in finite samples, the problem of the existence of moments is a real one. Moreover, as a final point you might ask yourself, "Why do we resort to reporting asymptotically valid results at all, when our sample size is finite?" 

The usual reason we give is that the asymptotic result is often a useful approximation to the true finite-sample result, especially if the latter is very difficult to derive.

That doesn't seem to provide very much comfort here - we know the exact finite-sample result: it involves the possible non-existence of moments.

Why, then, would we choose to approximate non-existing moments with something based on large-n asymptotics? Or with anything else, for that matter!

The bottom line?

Be very careful how you interpret the properties of simultaneous equations estimators, such as 2SLS. Even talking about some of these properties may not make complete sense, depending on the degree to which the equation being estimated is over-identified.

Terry Kinal's paper deals with certain "limited information" estimators for simultaneous equations models. There are some corresponding interesting results relating to the existence of the moments of "full information" estimators, such as FIML and 3SLS. But that's a story for another day.


Kinal, T. W. , 1980. The existence of moments of k-class estimators. Econometrica, 48, 241-249.

© 2014, David E. Giles


  1. Interesting post! I've read about this result before, but I didn't quite realize the implications - it sounds like reporting asymptotic standard errors is really questionable when we know that in fact "true" standard errors don't exist (or should a software just report "infinity" instead?).

    On the other hand, it seems to me that confidence intervals should be OK even when standard errors don't exist, since quantiles can be defined for any distribution (and presumably, once they exist, true CIs should converge to the usually reported asymptotic ones with sample size). Is such an understanding correct?

    1. You could re-define the confidence interval so that it was constructed using the quantiles. This is always possible. To do this you'd need to use the full exact finite-sample DISTRIBUTION of the 2SLS estimator. This isn't at all simple! This CI would converge to the asymptotic one, but why would the latter be of interest if you have a finite sample? :-)

    2. "This CI would converge to the asymptotic one, but why would the latter be of interest if you have a finite sample?"
      Isn't that always the case with asymptotic results though? :)

      My understanding is like this: let's say I wanted to do Monte Carlo experiment - for given true parameters and sample size n, I could obtain confidence interval (a_n, b_n) (of some particular parameter, let's say) by simulating many datasets of size n, doing the estimation for each, obtaining sampling distribution of estimates and taking say 5%, 95% quantiles. Repeating this for different sizes, I should find that (a_n, b_n) converges to (a, b) from asymptotic formula for large n. This gives me some confidence that asymptotic CI is a reasonable approximation.

      Whereas if I tried the same thing for standard error s_n, I would run into trouble, because sampling distribution at any given size n would not have well-defined second moment (how exactly would this manifest in simulation? hm, I should try it some time). Thus this kind of justification for relying on asymptotic standard error falls apart (in a way that it doesn't for CI).

    3. You can always bootstrap a confidence interval - we all know that. It will be valid without relying on any appeal to asymptotics. But that's not the point of this post.