Monday, May 26, 2014

Unit Root Testing: Sample Size vs. Sample Span

The more the merrier when it comes to the number of observations we have for our economic time-series data - right? Well, not necessarily. 

There are several reasons to be cautious, not the least of which include the possibility of structural breaks or regime-switching in the data-generating process. However, these are topics for future posts. Here, I want to discuss a different issue - namely, the impact of data frequency on the properties of tests for the stationarity of economic time-series.

To be specific, let's consider the following question: "Which is better when I'm applying the (augmented) Dickey-Fuller test - 20 annual observations for the series, or 80 quarterly observations?"

The short answer is that it really doesn't matter.

However, a couple of caveats are in order. I'm going to assume that any unit roots that may be present in the series are at the zero frequency. That is, I'm not going to consider unit roots at the seasonal frequencies. Clearly, we can't test for these using annual data. Moreover, if seasonal unit roots are present, then this should be taken into account when testing for unit roots at the zero frequency. For example, see Hylleberg et al. (1990).

The question that I've posed has received quite a bit of consideration in the time-series econometrics literature. A slightly more general way of posing the question is to ask if tests for unit roots are affected when we either temporally aggregate or selectively sample the data. The former process arises with flow variables when, for example, we add up monthly flows to get a quarterly or annual flow. The second process is associated with stock variables where we use a particular higher-frequency observation to represent the value of the lower-frequency variable. An example would be where we use the March-month unemployment rate as a measure of the rate for the March quarter.

In general, let's let "m" denote the frequency at which we selectively sample; or the number of periods that are aggregated in the case of a flow variable, As a detail, note that if, in the case of a stock variable, we average some higher-frequency values to get a low frequency value, then this constitutes m-period temporal aggregation, with a scaling of ( 1 / m). An example would be using the average of four end-of-quarter values for the CPI to get an annual CPI measure (m = 4).

Now, suppose that we have a time series generated according to:

                          yt = ρ yt-1 + ut    ;  t = 1, 2, ...., T

where ut follows a finite-order stationary ARMA process whose maximum AR modulus is less than ρ.

We want to test H0: ρ = 1 against a sequence of local alternatives. That is,  against  HA: ρ = exp(-c / T), where c > 0.

In this context, perhaps the most important result for you to aware of is the following, from Pierce and Snell (1995, p.336):
“Any test that is asymptotically independent of nuisance parameters under both H0 and HA has  a limiting distribution under both H0 and HA that is independent of m.”
So, this means that, asymptotically, temporal aggregation or selective sampling have no consequences in terms of size distortion, or loss of power, for the ADF, Phillips-Perron test, or Hall's (1994) IV-based unit root test. The same is true for several other related tests.

Pierce and Snell also show that even in finite samples, this result holds quite well; and it also applies to tests of non-cointegration, such as those of Engle and Granger, and Johansen.

Let's look at a simple, illustrative, example.The data I'll use are for imports of merchandise (c.i.f. basis) into New Zealand. The data are in nominal values, and I've seasonally adjusted them using the Census X-13 method, assuming a multiplicative decomposition of the time-series. The data were downloaded using the excellent, free, Infoshare facility provided by Statistics New Zealand.

(Snide remark: I hate to say it, but Statistics Canada could learn a few things from Statistics N.Z., and from plenty of other such agencies, when it comes to providing easy access to long-term economic time-series data.)

The fact that the data have been seasonally adjusted using the ratio-to-moving average method, prior to our unit root testing, can raise some issues of its own. Again, this is something that's best left for a separate post.

The data that I've used are available on this blog's data page, and the EViews workfile is on the code page. This is what the monthly, quarterly and annual data look like:

Now let's apply some unit root tests. I've used both the ADF test (where the null hypothesis is that the series is I(1), and the alternative is that it is I(0)); and th KPSS test (where the null hypothesis is that the series is I(0), and the alternative is that it is I(1)). 

(None of the series were found to be I(2).)

The ADF tests have been applied with an allowance for a drift and trend in the data, and the SIC was used to select degree of augmentation, k. For the  KPSS tests the bandwidth, b, was selected using the Newey-West method, with the Bartlett kernel.

Here are the results:

Sample                               T             k             ADF                   b          KPSS    
1960M1 - 2014M3              651           2            -2.050                21          0.709*

1960Q1 - 2014Q1              217            0            -2.027               11          0.456*

1960 - 2013                        54            4            -1.409                5          0.252*
* Significant at the 1% level, based on asymptotic critical values.

As you can see, results strongly support the presence of a unit root in the imports time-series, regardless of the degree of aggregation of the data (and hence the sample size, T) over the 54-year time-span under consideration here.


Hall, A., 1992. Testing for a unit root in time series using instrumental variable estimators with pretest data based model selection. Journal of Econometrics, 54, 223-250.

Hylleberg, S., R. F. Engle, C. W. J. Granger, and B. S. Yoo, 1990. Seasonal integration and cointegration. Journal of Econometrics, 44, 215-238.     

Pierse, R. G. and J. Snell, 1995. Temporal aggregation and the power of tests for a unit root. Journal of Econometrics, 65, 333-345.

© 2014, David E. Giles


  1. Dear Professor,

    All your posts help students a lot.

    If you can write a post on "Time-varying Parameter Estimation", it would be really helpful.

    Thanking you.

  2. Yes Santosh,

    Time-Varying coefficients also important in detecting the G-causalities of non-stationary variables. Dave Giles explains the topics very nicely. I hope, he will cover it soon.

  3. Dear professor, again I must thank you for the opportunity you give us in finding the best and correct answer to (almost all) our questions regarding econometrics. I was troubled yesterday by this very question related to small sample size of annual data vs. monthly/quarterly data in unit root testing and cointegration procedure and now I found my "peace" :))

  4. This result precludes stochastic volatility.

  5. Dear Dave Giles,

    First let me thank you for a very insightful blog! I find your explanations extremely concise and have used them often for my work.

    I wish to ask you a question somewhat related to the post above. In practice we often face short series. Suppose, for example, that we only have 5 years of quarterly data on two series, GDP growth and interest rate spread. Suppose in addition that we know, based on theory, that both should be stationary and that, due to a low power in small samples, unit root tests cannot reject the null of unit root (DF, ADF, PP) and at the same time cannot reject the null of stationarity (KPSS).

    Could I ask you, what do you recommend in such cases, to model the two variables in levels or rather difference them further? What are the consequences of using the first or the second option? Do you know of any references that investigate this issue?

    I sincerely thank you for any help you might provide.

    Kind regards,

    1. Vasja - thanks for your comment. I'd say, "If in doubt, difference the data." The reason for saying this is:

      1. If any of the data are non-stationary and you fail to difference them, then unless they happen to be cointegrated the results will be meaningless.

      2. If the data are actually stationary, but you difference them, they will still be stationary, although not I(0). "Over-differencing" the data is not totally desirable, but the penalties associated with this are much less than those associated with case 1 above.

  6. Dear Prof.,
    Thank you for sharing with us your knowledge, it is of great help.
    If i'm using ADF and got this warning "Warning: Probabilities and critical values calculated for 20 observations and may not be accurate for a sample size of 18"....
    I use KPSS and it shows that the time series is stationary, is that enough? and reliable?
    Thank you for your help.
    Best REgards,

    1. Yasmine - thanks for the question. First, make sure that the critical values use for the KPSS test were the finite-sample critical values for n=18, and not just the asymptotic critical values. You can get the exact critical values for the ADF test when n=18 from .
      There's only a small difference between these and the ones for n=20, so you're probably OK, as long as your ADF test results aren't "borderline". My (probably unhelpful suggestion) - get a larger sample!

    2. Thank you professor for your prompt reply, that was very helpful. It is difficult to get a larger sample.

  7. Respected Prof.
    Your blog is really very helpful. Your explanations are easy to understand and concise. Even a scholar from non-econometric background,like me also can understand it.
    I am dealing with time series of very few observations. Company wise it is varying say from 6-11 years. Before running regression I need to check stationarity. But different adf models are giving different results. My questions are do I need to check stationarity for such small sample? If yes then which method e.g. adf, kpss, pp should I use? If adf, then which model to be considered? I am using numXl software, an add-on to excel.
    I am stuck badly in my research for good many months as unable to get any workable solution. Kindly suggest me. It will be of great help. Thanking you in anticipation of prompt reply.

    1. Thanks for your comments. You certainly do have a very short span of data, an I'd be surprised if you can model anything sensibly with so few observations. Having said that..... (1) Yes, stationarity is always important; (2) I'm not surprised that the different tests are giving conflicting results; (3) There's no simple choice of test; (3) With ADF, the inclusion of drift and/or trend in the ADF regression depends on the features of the data.
      My suggestion - don't worry about testing for stationarity. Just first-difference all the series and then estimate the model using these data. Why? If any series is in fact non-stationary, differencing will deal with this. What if a series is I(0)? Then differencing the data is unnecessary, but the resulting series will still be stationary (but not I(0)). It's better to over-difference than to run the risk of a "spurious regression" by using un-differenced I(1) data.

  8. Respected Sir,
    Thank you so much for your prompt reply. It really helped me. I have one more query. You have suggested differencing of all the series irrespective of stationarity test but after differencing I am getting negative values to a large number. Now for log transformation, I need to add minimum positive value to almost all series. On the other hand if log transformation is followed by stationarity test, then differencing log transformed data will actually mean rate/ ratio. What do you suggest? I am in dilemma. In anticipation of your advice.

    Warm Regards


    1. With the negative values you simply can;t use logs. You SHOULD NOT add a value to make the number positive - NEVER!

  9. Certainly, it is a wonderful blog. I have a question, if your model has few variables, some series are stationary I(0) and some I(1) or I(2) characterizing different cointegration orders, how do you step forward?
    Your kind reply will be much appreciated.

    1. Syed - see

  10. Dear Prof,

    Thank you for your precious advice

    I have to check the stationarity of a series of annual stock price index over the period 1997-2017. In this case, I have 21 observations of annual data, but when I run the ADF test (SIC used to select maximum lags with automatic selection=4) the included observations after adjustments become 19 and this appears "Warning: Probabilities and critical values calculated for 20 observations and may not be accurate for a sample size of 19" and this is the case of the following 1/ in Level: with intercept 2/ in Level with Trend and intercept 3/ in First difference with Intercept 4/ in First difference with Trend and Intercept 5/ in First difference with None. Note that the series becomes stationary in first difference with None. My question is the following: Should I ignore the warning and conclude that the series is stationary at first difference I(1) without trend or intercept? What do you suggest?

    Best regards

    1. Hi - you can compute the critical values for any sample size using James MacKinnon's updated table:

      See pages 9 and 10. Note that the sample size is denoted "T", and "N" is the number of variables. So, for the case where N=1, you are computing critical values for the ADF test of a unit root. When N=2 you are computing critical values for the Engle-Granger cointegration test, etc. By way of an example, if T=19, and you want the 5% critical value for the ADF test with a drift(constant) and trend, the number will be c = [-3.4126 - (4.039/19) - (17.83/(19^2))] = -3.625.

    2. Thank you for your prompt response, I really appreciate your help.

    3. Dear Prof.
      I have one more question, I checked the stationarity with three different tests (ADF, PP and KPSS). I found the same results for both ADF and PP tests (the series is first difference stationary (with no trend and no constant)). But when using the KPSS test, I found a different result: I could not reject the null hypothesis of stationarity in level which means that the series is stationnary in level (under a constant and constant plus trend) . I am a little bit confused about the conclusion that I have to make: Is the series first difference stationary I(1) according to the ADF and PP tests, or is it stationary in level I(0) according to he KPSS test? Or Should I simply put the results as they are and conclude for each test separately? Thanks in advance

    4. This happens all the time! Keep in mind that each of these tests have different power properties, and the KPSS test has the reverse null & alternative hypotheses from the other two tests. Especially when the sample size is relatively small, or there is some structural break that you haven't been able to detect, conflicting results can arise. Ideally, try and report the conflicting results (if only in a footnote). However, generally you still have to come to a choice! If there is any chance that the series is I(1) rather than I(0), then treat it as being non-stationary. The reason is simply that the (statistical) "costs" you incur when you wrongly treat a series as being stationary are usually MORE than if you wrongly treat it as being non-stationary. For instance, suppose it is I(1) but you treat it as I(0) then this can be a big problem. On the other hand, if it is really I(0) but you treat it as being I(1), and decide to difference it, then the resulting series will still be stationary (even though it is not I(0) after being differences. A final comment - this inability to detect I(1) and I(0) series with perfection is precisely what makes the use of an ARDL model so appealing in many situations. With an ARDL model you don;t have to know whish series are I(0) and which ones are I(1). You just have to be confident that none of them are I(2).

    5. Thank you very much for the clarification. Thank you for your time!

  11. Dear Prof
    I am working on a time series data with only 8 observations. I want to know if unit root test is really necessary. If not, what step do i take?

    1. Stationarity of the data is ALWAYS crucial. However, with only 8 observations, I really don't think that you're going to be able to do any serious modelling. Sorry!


Note: Only a member of this blog may post a comment.