Sunday, October 12, 2014

Illustrating Asymptotic Behaviour - Part II

This is the second in a sequence of three posts that deal with large-sample asymptotics - especially in the context of the linear regression model. The first post dealt with item 1 in this list:
1. The consistency of the OLS estimator in a situation where it's known to be biased in small samples.
2. The correct way to think about the asymptotic distribution of the OLS estimator.
3. A comparison of the OLS estimator and another estimator, in terms of asymptotic efficiency.
No surprise, but this post deals with item 2. To get the most out of it, I strongly recommend reading the first post before proceeding.

The discussion is based on a small Monte Carlo experiment, using EViews. The program file is an extended version of that used for the earlier post, and it's available on the code page that accompanies this blog.

Just to re-cap, the data-generating process (DGP) that's being used is:

yt = β1 + β2 yt-1 + εt  ;    t = 2, 3, ....., n  ;   y= 0 .

The error term, εt, is generated according to a uniform distribution on the interval (-1 , +1), so it has a mean of zero, and a variance of 1/12.

My Monte Carlo experiment uses 5,000 replications, and the values of β1 and β2 have been set to 1.0 and 0.5 respectively in the DGP. Once again, the results will focus on the estimation of β2.

In finite samples, the OLS estimator of the coefficient vector in this model is biased (due to the presence of the lagged dependent variable as a regressor); and its sampling distribution is non-normal because of the non-normal error term.

However, this estimator is consistent, and if n is large enough then it becomes normally distributed.

In this post I want to provide some insights into why we usually work with a "scaled" or "normalized" version of an estimator when we're talking about its asymptotic distribution.

To motivate the discussion, let's first take a step back and think about a simpler (but related) problem. Suppose that we take a simple random sample of size n from some population that has a finite mean, μ, an a finite variance, σ2. (The population needn't be normal.) Then, the sample average, x* = (1/n)Σ(xi), is an unbiased estimator of μ, and the variance of x* is (σ2 / n). That is, the mean and variance of the sampling distribution of x* are μ and (σ2 / n) respectively.

Notice, in particular, that because σ2 is fixed and finite, the variance of x* eventually goes to zero if the sample size grows indefinitely. This, together with the unbiasedness property, tells us that x* is "mean-square consistent" and hence "weakly consistent", when used as an estimator of μ. (See here for more on these two types of consistency.)

This is similar to, but a little more extreme, than the situation that we saw for the OLS estimator of β2 in the previous post. There, the initial bias of the estimator went to zero, and the variance of the estimator also went to zero, as n got larger and larger. Eventually, the estimator's sampling distribution was just a "spike", centered at the true value of β2, and with negligible width.

If this asymptotic distribution has no width, or dispersion, it sounds as if the "asymptotic variance" of the estimator is zero. In addition, as this will also be the case for any other consistent estimator, how can we compare the large-sample variabilities of such estimators? In other words, how can we talk about relative asymptotic efficiency?

Considering x* as an estimator for μ gives us an example that's very helpful when it comes to answering this question. Suppose that, instead of concentrating on x* itself, instead we look at the scaled statistic, n½ x*. Notice that the variance of this statistic is (n½)2 var.(x*) = σ2. That is, it's constant, and it doesn't vanish as n grows!

By convention we also usually subtract the value of the parameter we're trying to estimate, from the estimator. Then we scale the resulting "estimation error". In this example, what we'd do is look at (n½)(x* - μ). This quantity has a mean of zero and a variance of σ2, for any value of n. So, its asymptotic mean is also zero, and its asymptotic variance is also σ2. In fact, for this particular example, the asymptotic distribution happens to be Normal, by the Lindeberg-Lévy  Central Limit Theorem, but that's not the main point here.

The important point is that by scaling or normalizing the estimator in this way, we stop its sampling distribution from collapsing to a spike when n is very large. The choice of n½ as the scaling factor is "just right" for this purpose - not too big, and not too small. (Think of Goldilocks and the three bears!)

The asymptotic distribution now has a variance that can be compared in a meaningful way with the variances of the asymptotic distributions of other potential (consistent) estimators. In each case, the estimation error has to be constructed, and then we scale it.

Scaling by (n½) is usually going to be the appropriate thing to do in the situations you're likely to encounter. There are some exceptions, though. For instance, if we're dealing with non-stationary time-series data, the correct scaling is often by n itself, not (n½). In the context of cointegrated data, the OLS estimator is "super-consistent" - the rate of convergence is n, not (n½). However, that needn't concern us here.

Now, let's return to our regression model, and the OLS estimator for β2. Here's a summary of what we saw in the previous post for the sampling distribution of this estimator as n grew:
In line with what we've just been discussing, we can prevent this "collapse" of the OLS estimator's sampling distribution if, rather than looking at the estimator (b2) itself, we look at (n½)(b2 - β2).

The following set of histograms match those given in the first post. Here, we want the sampling distribution to end up with a mean of zero (not 0.5), and a "stable" variance. The skewness and kurtosis coefficients are, of course, identical to their counterparts in the first post.
Notice that the mean of the sampling distribution becomes a little closer to zero as n increases from 20 to 100. On the other hand, the standard deviation increases from 0.75 to 0.84. That's O.K.!

Now we can see that the asymptotic distribution's mean is getting very close to zero, and the asymptotic standard deviation is "stabilizing" quite nicely. Finally, when n gets even bigger:

Now we see the asymptotic normality of the sampling distribution, and the fact that the asymptotic standard deviation of the scaled estimator has stabilized at a value of approximately 0.85.

It's the latter value that can now be used to determine the asymptotic efficiency of the OLS estimator relative to other consistent estimators of β2. That's the topic of the next post in this sequence.