Saturday, October 11, 2014

Illustrating Asymptotic Behaviour - Part I

Learning the basics about the (large sample) asymptotic behaviour of estimators and test statistics is always a challenge. Teaching this material can be challenging too!

So, in this post and in two more to follow, I'm going to talk about a small Monte Carlo experiment that illustrates some aspects of the asymptotic behaviour of the OLS estimator. I'll focus on three things:
  1. The consistency of the OLS estimator in a situation where it's known to be biased in small samples.
  2. The correct way to think about the asymptotic distribution of the OLS estimator.
  3. A comparison of the OLS estimator and another estimator, in terms of asymptotic efficiency.

The EViews program that I've written is on the code page for this blog, and it generates its own EViews workfile. The program will get extended for the subsequent posts, and re-linked. (If you're not an EViews user, you can read the program file with any text editor, and there are lots of comments in the code to explain what's going on.)

The data-generating process (DGP) that's used for all three posts is:

                 yt = β1 + β2 yt-1 + εt  ;    t = 2, 3, ....., n  ;   y= 0 .

The error term, εt, is generated according to a uniform distribution on the interval (-1 , +1), so it has a mean of zero, and a variance of 0.08333.

The errors have been chosen to be non-normal deliberately. One of the things that I want us to see is how the sampling distribution of the OLS estimator (which will be non-normal in finite samples in this case) eventually becomes normal as the sample size grows.

My Monte Carlo experiment uses 5,000 replications, and the values of β1 and β2 have been set to 1.0 and 0.5 respectively in the DGP. In what follows the results will focus on the estimation of β2, but this is not especially important.

In this post I'll be concentrating on just the first of the three numbered items above. The thing to keep in mind is that the OLS estimator is a random variable, so it has a probability distribution. Because the estimator is a sample statistic (a function of the random sample data), we call this distribution the "sampling distribution" of the estimator. The form that this distribution takes depends on the size of the sample that wee're using.

This is important. Typically, the value of the point estimate will change as n grows. However, more fundamentally, the distribution of the (random) estimator will also change, and this is what we will be interested in here.

To begin with, 5,000 different random samples (of size n = 20) have been generated, and in each case the model has then been estimated by OLS. The 5,000 point estimates of β2 have been saved. The distribution of these 5,000 values is a very close approximation to the true sampling distribution of the estimator. (The full sampling distribution would require that we do this an infinity of times, not 5,000, but we don't have time for that!)

With n = 20, the sampling distribution of the OLS estimator of β2 looks like this:
The mean of the sampling distribution is approximately 0.42, compared with the true value of β2, which is 0.5. So, the OLS estimator downward-biased as a result of having a lag of the dependent variable as a regressor. Notice that the standard deviation of the estimator (i.e., of its sampling distribution) is approximately 0.17. Finally, we can see from the Jarque-Bera test that the estimator's sampling distribution is non-normal. In particular, the distribution is substantially skewed to the left.

Increasing the sample size to n = 100, and then to n = 250, we get the following results:

We can see that as the sample size grows, the bias of the OLS estimator is decreasing - the expected value of the (sampling distribution of the) estimator is getting closer to the true value of the parameter, β2 = 0.5. In addition, the variability of the estimator is decreasing. The standard deviation of the sampling distribution falls from 0.17 when n = 10, to 0.08 and then to 0.05 when n = 250. However, even with the last of these sample sizes, the distribution of the OLS estimator is still non-normal.

(When interpreting these graphs, and the ones below, keep an eye on the scale for the horizontal axis - typically it will be changing. This is why we might still see quite a lot of "spread" in the distributions, even though the standard deviation is declining.)

Finally, when we set n =  1,000, and then n = 5,000, this is what we get:

In this particular example, we need quite a large sample size before the sampling distribution of our OLS estimator is normal. In particular, there's a negative skewness to this distribution that's noticeable even for moderate sample sizes.

The vanishing bias, and the decreasing variability of the estimator (as n grows) can be seen more directly in this graph:
What can be observed here is the consistency of the OLS estimator. (For a discussion of different types of "consistency", see this earlier post.) If the sample size could be made infinitely large, the density for the sampling distribution would collapse to a degenerate "spike", centered at the true value of β2, and with negligible width (spread).

At that final, hypothetical, stage the sampling distribution wouldn't really be a proper normal distribution. Also, as the "spike" would have no dispersion, it's not clear what it would mean to talk about the "asymptotic variance" of the estimator. Isn't it just zero, like the asymptotic bias?

This last issue will be the topic for the next post in this sequence.

© 2014, David E. Giles


  1. Hi,

    Thanks for this very helpful entry. In my class I also use a similar MC experiment to demonstrate the consistency of the OLS estimator of the AR(1) process. I use STATA and I thought it may be helpful for those are interested in STATA programming. So here is the code with the same DGP and parameter values. (PS: I admit that there may be more elegant ways of coding this :))

    // Consistency of OLS estimator of AR(1) model

    capt prog drop arols
    program arols, rclass
    version 12
    syntax [, N(int 50), burnin(int 100)]
    drop _all
    set obs `n'
    gen double u = -1 + 2*runiform() // random error, from U(-1,1)
    gen y = 0
    replace y = 1 + 0.5*y[_n-1] + u in 2/`n'
    drop in 1/`burnin'
    gen t = _n
    tsset t
    reg y l.y
    ret sca b1 =_b[_cons]
    ret sca b2 =_b[l.y]

    graph drop _all
    glo numsim = 5000
    local burnin = 200
    foreach T of numlist 20 100 250 1000 5000 {
    local n = `burnin' + `T'
    simulate ARpar`T'=r(b2), ///
    reps($numsim) saving(OLSMC`T', replace) nolegend nodots: ///
    arols, n(`n') burnin(`burnin')
    hist ARpar`T', normal name(n`T') title("AR(1) Parameter: n=`T'")
    use OLSMC20
    foreach T of numlist 100 250 1000 5000{
    merge using OLSMC`T'
    drop _merge
    tw (kdensity ARpar20) (kdensity ARpar100) (kdensity ARpar250) (kdensity ARpar1000) (kdensity ARpar5000)

    The code above draws the kernel density estimates similar to the last graph in your post. Also here are the summary statistics:

    Variable | Obs Mean Std. Dev. Min Max
    ARpar20 | 5000 .371518 .2144403 -.5469691 .9904804
    ARpar100 | 5000 .4751635 .0893891 .0822227 .7326534
    ARpar250 | 5000 .4906103 .0546967 .2765988 .6818358
    ARpar1000 | 5000 .4973098 .0272844 .3987602 .5796224
    ARpar5000 | 5000 .4998611 .0121114 .4362852 .5412318

  2. very useful to understand the concept of consistency