## Monday, October 13, 2014

### Illustrating Asymptotic Behaviour - Part III

This is the third in a sequence of posts about some basic concepts relating to large-sample asymptotics and the linear regression model. The first two posts (here and here) dealt with items 1 and 2 in the following list, and you'll find it helpful to read them before proceeding with this post:
1. The consistency of the OLS estimator in a situation where it's known to be biased in small samples.
2. The correct way to think about the asymptotic distribution of the OLS estimator.
3. A comparison of the OLS estimator and another estimator, in terms of asymptotic efficiency.
Here, we're going to deal with item 3, again via a small Monte Carlo experiment, using EViews.

The program file is on the code page that accompanies this blog. It's a slightly extended version of the program used in the previous, related, post.

The data-generating process (DGP) that's being used is:

yt = β1 + β2 yt-1 + εt  ;    t = 2, 3, ....., n  ;   y= 0 .

This time, however, the error term (εt) is generated according to the sum of a uniform distribution on the interval (-1 , +1), and a Student-t distribution with 3 degrees of freedom. So, it has a mean of zero, and a variance of 3.0833. This will result in some "outliers" in the data for yt.

The Monte Carlo experiment uses 5,000 replications, and the values of β1 and β2 have been set to 1.0 and 0.5 respectively in the DGP. We'll focus on the estimation of β2.

In this context, the OLS estimator of the coefficients is biased, with a non-normal sampling distribution in finite samples. However, as was shown in the previous two posts, this estimator is consistent, with an asymptotic distribution that is normal.

The final part of our exercise involves considering an alternative estimator to OLS - the Least Absolute Deviations (LAD) estimator. This is just the quantile regression estimator with the quantile choice set to 0.5. This estimator is relatively more robust than OLS to outliers in the data. It's the regression counterpart to using the sample median, rather than the sample mean, to estimate a population mean.

The behaviour of the LAD estimator can be compared with that of OLS - specifically we can consider the issue of asymptotic efficiency. For reasons that were explained in detail in the last related post, the estimators' sampling distributions that we'll be considering are for the scaled (normalized) "estimation errors": n½(b2 - β2), and n½(b*2 - β2). Here, b2 is the OLS estimator of β2, and b*2 is the corresponding LAD estimator.

The LAD estimator is also biased in the present context, but it is also a consistent estimator. This means that we can compare the variances (or standard deviations) of the sampling distributions for the normalized forms of b2 and b*2 when n is extremely large. This will enable us to determine which estimator is relatively more efficient, asymptotically.

Let's look at some results, for increasing values of n. First, when n = 20, the OLS estimator of β2 is more "precise" and more biased than the LAD estimator, but it's relatively more efficient, as can be seen from the MSE values.
By the time n = 100, the biases and variances have all decreased, but the bias-variance trade-off is such that now the LAD estimator is relatively more efficient than the OLS estimator. Now, let's increase the sample size further:
So, what we're seeing here is a pair of estimators that have different properties when the sample size is small, but are both are asymptotically unbiased and consistent. However, the variance of the asymptotic distribution of the LAD estimator is smaller than that of the asymptotic distribution of the OLS estimator.

For the problem that we're considering here, the LAD estimator is relatively more efficient, asymptotically, than is the OLS estimator. Notice that this is not something that we would have been able to discern if we hadn't normalized the two estimators. In the latter case, the large-sample variances would each have been zero in value.

It's worth noting that the superior asymptotic efficiency of the LAD estimator is a result of the (very) non-normal errors in our regression model. The converse result would hold if the errors were normally distributed.

As for the sample sizes that are needed to achieve the asymptotic normality of the two estimators, the Jarque-Bera test statistics (p-values) tell the story:

1,000          29.60   (0.00)       11.01  (0.00)

1,500          11.60   (0.00)        4.98   (0.08)

5,000            2.62   (0.27)        2.48   (0.29)