Econometrics Beat: Dave Giles' Blog: Some Properties of Non-linear Least Squares

Tuesday, October 30, 2012

Some Properties of Non-linear Least Squares

You probably know that when we have a regression model that is non-linear in the parameters, the Non-Linear Least Squares (NLLS) estimator is generally biased, but it's weakly consistent. This is the case even if the model has non-random regressors and an additive error term that satisfies all of the usual assumptions.

In addition, even if the model’s errors are normally distributed, the NLLS estimator will have a sampling distribution that is non-normal in finite samples, and the usual t-statistics will not be Student-t distributed in finite samples.

In this post I'll illustrate these, and some other results, by using a simple Monte Carlo experiment.

As the sample size grows without limit, eventually the bias of the NLLS estimator vanishes – it's an asymptotically unbiased estimator. Generally, its variance also converges to zero, meaning that the estimator is mean square consistent as well as weakly consistent.

Also, as the sample size grows, eventually the sampling distribution of the NLLS estimator becomes (asymptotically) normal, due to the central limit theorem, and the t-statistics have their usual asymptotic properties – in particular, they will also be normally distributed.

I've used the EViews workfile and program file, available on the code page for this blog, to illustrate the above results for this very simple non-linear regression model:

y_i = β₁ + (β₂x_2i)^β₁ + ε_i ; ε_i ~ i.i.d. N[0 , σ²] ; i = 1, 2, …., n .

The values assigned to the parameters are β₁ = 1.0; β₂ = 2.0; and σ = 1.0. Using 5,000 replications in the Monte Carlo experiment, the following results were obtained with respect to the NLLS estimator of (see the READ_ME text-object in the EViews workfile):

n Bias(b₂) Var.(b₂)

10 0.22704 1.00489

25 0.07956 0.22524

50 0.03926 0.08556

100 0.01077 0.02190

250 0.00348 0.00692

500 0.00144 0.00404

1,000 0.00148 0.00230

5,000 0.00049 0.00039

10,000 0.00011 0.00012

(Here, b₂ is the NLLS estimator for β₂.)

These results illustrate the bias and the mean-square consistency of the estimator. Although a sample of size 500 appears to be sufficient for the consistency of the estimator to become apparent, the convergence is not so rapid when it comes to the normality of the sampling distributions for the estimator itself and for the associated t-statistic (let's call it t₂) for testing H₀: β₂ = 2. (So, the null hypothesis is true.) This is illustrated in the following charts.

The non-normality of the sampling distribution of b₂ is obvious.

It's clear that the sampling distribution of t₂, the t-statistic, is neither t-distributed nor normally distributed.

The non-normality of the sampling distribution of b₂ is quite clear from the skewness coefficient, and the p-value associated with the Jarque-Bera test statistic.

The non-normality of the sampling distribution of t₂ is really obvious from the skewness coefficient and the very small p-value for the Jarque-Bera test statistic.

Using the Jarque-Bera test, we can't reject the hypothesis that the sampling distribution of b₂ is normal, when n = 10,000.

Finally,we cannot reject the hypothesis that the sampling distribution of t₂ is standard normal when n = 10,000.

As we can see, even when we have a very simple non-linear regression model, we may need a VERY large sample size before the asymptotics fully come into play.

This last point is pretty important. Students often ask: "How large does the sample size have to be before we can rely on results that are valid only asymptotically?" The correct answer (and it's not a "cop-out") is: "It depends."

It depends on the particular problem, the data values, and often the true values of the unobservable parameters. So, be cautious!

6 comments:

Carlos CinelliOctober 31, 2012 at 5:59 AM
Hi, Giles,

I think this is the case that it is very easy to conflate statiscal signficance with practical significance.

See, for example, in the last case, with n=10.000 for t2, the p-value is indeed small, but that doesn't really matter.

What we want to know is if the departure from normality is big enough so that it wouldn`t be ok to use a t test. That is, we want to know if the departures from Skewness and Curtosis are big enough that the normal or t distribution would be a bad approximation.

The p-value will not tell us how big the departure is. See, even if the distribution were normal Skew=0,0000001, Kurt=3,000001, if we take a huge enough sample, we could get p<0,0000001.

So what we ought to do in this case is to see if the suggested deviation in Skewness and Kurtosis provided by the data is big enough to practical matters.

Some people say that a Skewness below 0.3 and Kurtosis between 2,5-3,5 could be ok form most purposes.

So, in that case, n=500 could aready be enough a sample, even though we would still reject perfect normality.

Best

Carlos
ReplyDelete
Replies
AnonymousOctober 31, 2012 at 10:47 AM
Very interesting! IV estimators are of course another important case in point.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Tuesday, October 30, 2012

Some Properties of Non-linear Least Squares

6 comments: