Tuesday, November 11, 2014

Normality Testing & Non-Stationary Data

Bob Jensen emailed me about my recent post about the way in which the Jarque-Bera test can be impacted when temporally aggregated data are used. Apparently he publicized my post on the listserv for Accounting Educators in the U.S.. He also drew my attention to a paper from Two former presidents of the AAA: "Some Methodological Deficiencies in Empirical Research Articles in Accounting", by Thomas R. Dyckman and Stephen A. Zeff, Accounting Horizons, September 2014, 28 (3), 695-712. (Here.) 

Bob commented that an even more important issue might be that our data may be non-stationary. Indeed, this is always something that should concern us, and regular readers of this blog will know that non-stationary data, cointegration, and the like have been the subject of a lot of my posts.

In fact, the impact of unit roots on the Jarque-Bera test was mentioned in this old post about "spurious regressions". There, I mentioned a paper of mine (Giles, 2007) in which I proved that:

"The Jarque-Bera (JB) normality test. When applied to a "spurious regression" model, (T−1JB) converges weakly as, T ↑ ∞ and so JB itself diverges at the rate “T”.
The Breusch-Pagan-Godfrey homoskedasticity test. When applied to a "spurious regression" model, the statistic for the TR2 version of the test diverges at the rate "T" as T ↑ ∞ . The same is true for the (SSR/2) version of the test.
So, testing for the normality or homoskedasticity of the errors in a "spurious regression" will always lead to a rejection of the associated null hypotheses, for large enough T, whether these hypotheses are false or true".
The first of these two results actually addressed the concern that Bob rightly expressed.

My paper included the formal analytical derivations of these results, based on the non-standard asymptotics used by Phillips (1986), and others. It also included some illustrative numerical evaluations. 

Let's look at a different way of illustrating the behaviour of the JB test in the context of a spurious regression. It's based on a simple simulation experiment, using EViews. The program can be found on the code page for this blog, and that file can be read with any text editor.

The experiment involves estimating 1,000 OLS regressions of the form:

           yt = α + βxt + εt     ;   t = 1, 2, ..., T

where,

          yt = ρyt-1 + ut       ;    ut ~ iid N[0 , 1]

          xt = ρxt-1 + vt       ;    vt ~ iid N[0 , 1]  .

The associated JB statistics are computed and this gives a representation of that statistic's sampling distribution. This is done for various sample sizes, T. Two values of rho are considered. First, ρ = 0, so that the data are stationary. This provides a bench-mark situation where the JB test should perform as intended. Second, ρ = 1, so that both y and x have a unit root (but are not cointegrated). This is the spurious regression case.

Here are some results, for different values of T, when ρ = 0. (Clicking on the graphs will make them larger.)

Because the null hypothesis of normality is true, the JB statistic should have an asymptotic distribution that is χ2 with 2 degrees of freedom. So, the mean and variance of this distribution should be 2 and 4 respectively. We see that these results are essentially satisfied by the time the sample is as large as T = 5,000.

Now let's look at the corresponding results when ρ = 1:

Clearly, the JB statistic is no longer behaving in the same way as it did when the data were stationary. The median value of JB over the 1,000 replications increases from 2.2, to 20.1, and then to 215.4 as T increases from 50, to 500, and then to 5,000.

Finally, let's really push things a little, and see what happens when T = 50,000:

Now the median value for the JB statistics is over 2,000. We can really see the divergence of sampling distribution.

Do you have a good eye? The median values of JB went up by a factor of 10 each time T was increased by a factor of 10. In other words, the JB values are "exploding" at the same rate as T. This corresponds precisely with the analytic results in Giles (2007).


References

Giles, D. E. A., 2007. Spurious regressions with time-series data: Further asymptotic results. Communications in Statistics - Theory and Methods, 36, 967-979. (Free download here.)

Phillips, P. C. B., 1986. Understanding spurious regressions in econometrics. Journal of Econometrics, 33, 311-340.



© 2014, David E. Giles

2 comments: