I think the problem is the confusion between stati...

2014-01-09T11:22:30.599-08:00

I think the problem is the confusion between statistical and economic significance. In your example of 0.01 you are not wrong rejecting the null. After all, 0.01 is not 0, and that difference could be important in some context. Maybe we must change the nulls we test (vg. H0: b <0.1 or whatever we think is economically relevant) instead of changing the way we interpret p-values.

Sinclair: Thanks for kind feedback. Glad that the ...

2011-04-27T13:53:29.947-07:00

Sinclair: Thanks for kind feedback. Glad that the post was helpful.

Anonymous: Thanks for the very useful comment. And...

2011-04-27T13:52:28.933-07:00

Anonymous: Thanks for the very useful comment. And yes, you do make sense. I think I could have expressed a couple of points a little better in the posting.

Let me try again. I agree (and discussed why, in an earlier posting) that the null distribution of the p-value is independent of the sample size – indeed it is simply uniform on [0 , 1]. This is not the case under the alternative.

Quoting from Granger (1998, p.260) : “In simple cases where confidence intervals are O(1/n) they will be effectively zero [in length; D.G.], so that virtually any parsimonious parametric model will produce a very low p-value and will be strongly rejected by any standard hypothesis test using the usual confidence levels. Virtually all specific null hypotheses will be rejected using present standards [5%, 1% significance levels; D.G.].”

It’s worth adding that in the case of cointegrated data, where the convergence rate for the asymptotics is O(n), rather than O(n^1/2), then the confidence interval lengths are O(n^-2), and the situation is even more extreme!

The point that we should take away from the posting is that if the null is just a tiny bit false – not to the extent that it really makes any economic difference – then we’ll always reject that null if we have a big enough sample size, & we use conventional significance levels. To guard against this it would be wise to dramatically reduce the magnitude of the p-value that we require to reject the null.

Take an example – estimating a simple OLS regression model under ideal assumptions. We'll test if the slope coefficient is zero, against the alternative that it's positive, using the t-test. This test is UMP against 1-sided alternatives. The test statistic is standard normally distributed, under the null, if n is large.

I’ve conducted a Monte Carlo experiment (1,000 replications), generating the y data using an intercept coefficient of unity, a slope coefficient of 0.01, and i.i.d. N[0,1] errors. The null we’re testing is slightly false. If the data were measured in logarithms, the slope coefficient would be an elasticity, and in economic terms it probably makes very little difference if that elasticity is zero or 0.01.

The EViews workfile and program file for the experiment are on this blog's Code page.

For n = 100; 5,000; 50,000; and 100,000 the averages of the 1,000 t-statistics and of their 1 – sided p-values are 0.088 and 0.252 when n = 100; 0.676 and 0.217 when n = 5,000; 2.270 and 0.054 when n = 50,000; and 3.193 and 0.022 when n = 100,000. (Note that I have reported the average of the 1,000 p-values, not the p-value of the average of the 1,000 t-statistics.)

For this example, with a sample of 50,000 observations, the false null is not quite rejected at the 5% significance level, but we’d reject it at the 10% level, on average. We finally get a rejection at both these significance levels when n = 100,000. But is there any real economic sense in rejecting? After all, the true value of the parameter is 0.01 (rather than exactly zero).

If we then increase the sample size to n = 250,000, the averages of the t-statistics and their p-values are 4.993 and 0.001, respectively. Now I’d say there’s a clear rejection of the null.

It’s in this sense that we probably should be insisting on rally, really small p-values before we reject the null when the sample size is extremely large.

Incidentally, I see that Andrew Gelman was blogging on a similar point a couple of years ago, albeit in a slightly different context. See http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/the_sample_size.html

Anonymous – if I’m reading your comment correctly, I don’t think we’re in disagreement on any of this. Again, thanks for the helpful input.

Fantastic post - I often get into arguments about ...

2011-04-27T00:47:52.651-07:00

Fantastic post - I often get into arguments about large sample size and barely statistically significant p-values so its good to have a easily accessible blog post to point to.

As a reader of the blog I should probably understa...

2011-04-26T19:20:28.503-07:00

As a reader of the blog I should probably understand p-values better, having said that, a p-value of 0.01 with any number of samples, is just as indicative of the falseness of the null hypothesis (as given the null, the p-value has the same distribution). For large samples, large p-values should accompany minuscule alternate hypothesis, which can make them uninteresting. A combination of large effects and large p-values can indicate looseness of the model (regardless of sample size), which is prone to over-fitting (another familiar malaise). Do I make sense?

Comments on Econometrics Beat: Dave Giles' Blog: Drawing Inferences From Very Large Data-Sets

I think the problem is the confusion between stati...

Sinclair: Thanks for kind feedback. Glad that the ...

Anonymous: Thanks for the very useful comment. And...

Fantastic post - I often get into arguments about ...

As a reader of the blog I should probably understa...