Students of econometrics are familiar with the "spurious regression" problem that can arise with (non-stationary) time-series data.
As was pointed out by Granger and Newbold (1974), the “levels” of many economic time-series are integrated (or nearly so), and if these data are used in a regression model then a high value for the coefficient of determination (R2) is likely to arise, even when the series are actually independent of each other.
They also demonstrated that the associated regression residuals are likely to be positively autocorrelated, resulting a very low value for the Durbin-Watson (DW) statistic. There was a time when we tended to describe a “spurious regression” as one in which R2 > DW.
Eventually, Phillips (1986) came up with an elegant formal analytical explanation for the behaviours of the OLS regression coefficient estimator, the usual t-statistics and F-statistic, and the R2 and DW statistics in models estimated using non-stationary time-series data.
Specifically, Phillips introduced a new asymptotic theory that he then used to prove that in a spurious regression the DW statistic converges in probability to zero, the OLS parameter estimators and R2 converge to non-standard limiting distributions, and the t-ratios and F-statistic diverge in distribution, as T ↑ ∞ .
So, effectively Phillips “solved” the spurious regression problem. Moreover, he proved that we can't avoid the adverse consequences of modelling with integrated (but not cointegrated) data simply by increasing our sample size.
We now know that if the data are integrated, but not cointegrated, then they have to be filtered (typically, differenced) before being used in a regression model.
On the other hand, if the data are cointegrated, then we can legitimately estimate a model using the levels of the data. This will be the long-run equilibrating relationship between the variables. Alternatively, we can difference the data but include an "error correction term" in the regression model. The error correction model captures the short-run dynamics of the relationship between the variables.
Perhaps not surprisingly, various other statistics that we routinely report as part of our regression results also have strange properties when the data are integrated. That's to say, if we inadvertently estimate a "spurious regression", there'll be some other warning signals.
These are less well-known than the R2, DW, t and F results noted above. For example, in Giles (2007) I proved the following results:
- The Jarque-Bera (JB) normality test. When applied to a "spurious regression" model, (T−1JB) converges weakly as, T ↑ ∞ and so JB itself diverges at the rate “T”.
- The Breusch-Pagan-Godfrey homoskedasticity test. When applied to a "spurious regression" model, the statistic for the TR2 version of the test diverges at the rate "T" as T ↑ ∞ . The same is true for the (SSR/2) version of the test.
So, testing for the normality or homoskedasticity of the errors in a "spurious regression" will always lead to a rejection of the associated null hypotheses, for large enough T, whether these hypotheses are false or true.
Again, notice that these unfortunate properties are still there even if the sample size is infinitely large. They're driven by the characteristics of the data, not the amount of data that's available.
Of course, one could argue that there's really no excuse for estimating "spurious regressions" these days. We have a whole raft of tests for non-stationarity and cointegration. However, many of these tests have quite low power, and they often lack robustness when there are structural breaks in the data, for example. So, "spurious regressions" can still occur, and the more warning signals we have, the better.
Giles, D. E. A., 2007. Spurious regressions with time-series data: Further asymptotic results. Communications in Statistics - Theory and Methods, 36, 967-979. (Free download here.)
Granger, C. W. J. and P. Newbold, 1974. Spurious regressions in econometrics. Journal of Econometrics, 2, 111-120.
Phillips, P. C. B., 1986. Understanding spurious regressions in econometrics. Journal of Econometrics, 33, 311-340.
© 2012, David E. Giles