Friday, January 24, 2014

Testing Up, or Testing Down?

Students are told that if you're going to go in for sequential testing, when determining the specification of a model, then the sequence that you follow should be "from the general to the specific". That is, you should start off with a "large" model, and then simplify it - not vice versa.

At least, I hope this is what they're told!

But are they told why they should "test down", rather than "test up"? Judging by some of the things I read and hear, I think the answer to the last question is "no"!

The "general-to-specific" modelling strategy is usually attributed to David Hendry, and an accessible overview of the associated literature is provided by Campos et al. (2005).

Let's take a look at just one aspect of this important topic. 

To give us something concrete to refer to, I'll consider the problem of estimating a linear multiple regression model. However, the general points that I'll be making are not restricted to this simple situation. Suppose that we have K potential regressors, and n (> K + 1) sample observations.

We start off by estimating the model

                 y = β0 + β1x1 + β2x2 + ..... + βKxK + u   .

If you want something specific to hang on to here, you can assume that we're using OLS estimation. However, (again) the general points being made don't depend on any particular choice of estimator.

Next, we might test the significance of xK. That is, we test H0: βK = 0 against the appropriate alternative hypothesis, HA. The latter may be one-sided or two-sided, depending on our prior information about the sign of the potential impact of xK on y.

If we reject H0, we would keep xK in our model. Not rejecting H0 would lead us to drop this regressor from the model. Then, we might test H0: βK-1 = 0 against an appropriate alternative hypothesis, HA. If we reject this H0 we would keep xK-1 in the model; but if we can't reject H0 then we drop this regressor from the model. And so on.

This might sound fine, but there's a slightly different way of proceeding that actually makes more sense. Consider the following "nested" sequence of null and alternative hypotheses:

H0(1): βK = 0                                vs.         HA(1): βK ≠ 0                (or a 1-sided alternative)
H0(2): βK = βK-1 = 0                     vs.         H0(1)
H0(3): βK = βK-1 = βK-2 = 0          vs.         H0(2)
.   etc.

Note that each null hypothesis "nests" the preceding null hypothesis. Also, notice that alternative hypothesis is the preceding null hypothesis, and not the overall maintained hypothesis that all of the coefficients are non-zero. We stop testing when we reject one of the null hypotheses.

There's a very good reason why the sequence of increasingly restrictive ("nested") null hypotheses is tested in this particular way. Anderson (1971, pp.34-43, 116-134, 270-276) shows that this strategy is uniformly most powerful (UMP) in the family of testing procedures that fix the probability of accepting an hypothesis that is less restrictive than the true one.

Setting up the sequence of tests in the manner described above ensures that the test statistics are independent of each other - this won't be the case if we fail to assign the alternative hypotheses in this way, or if we test in a way that is increasingly less restrictive. That is, if test "from the specific to the general", rather than "from the general to the specific".

Now, what have I neglected to mention so far? Nothing has been said about the significance levels that are being used at each step of the sequence of tests that we're conducting. The independence of the test statistics (which follows from a theorem of Basu, 1955 - see, also, Seber, 1964) enables us to keep track of the significance level of the overall testing strategy. 

Specifically, if αi is the significance level assigned for the ith test in the above sequence (i.e., when the null hypothesis is H0(i)), then the actual significance level of the implicit test of H0(s) against the overall maintained hypothesis that all of the coefficients are non-zero is [1 - (1 - α1)(1 - α2)........(1 - αs)] ; for s = 1, 2, ........

The product comes from the independence.

This also suggests that we might want to assign a small value to α1, and then increase this value as i increases. Mizon (1977, p.110) suggests setting αi = (i ε / n), for i = 1, 2, ..., n. Here, n is the number of hypotheses in the "nested" sequence, and ε is chosen to satisfy the relation (1 - ε)n = (1 - α), where α is the desired maximum significance level for the test of the most restrictive hypothesis against the overall maintained hypothesis

Now, note that all of this discussion about testing from the general to the specific, and setting up the testing sequence in a way that ensures independence of the test statistics, holds regardless of the type of model that we're dealing with, or the type of estimators or tests that we're using. For example, we might be conducting likelihood ration tests of non-linear restriction on the parameters of a system of equations, as in Giles and Hampton (1985) for example.

Also, it has to be pointed out that conventional "step-wise regression" routines, of the type found in many statistics and econometrics packages ignore the issues I've been discussing. Whether they are variable-addition or variable-deletion strategies, the tests they use are not independent of each other, and you really have no idea at all what real significance levels are being used (as opposed to the nominal  levels that are being assigned at each step.)

My advice - stay away from step-wise regression routines!

In the above set-up, the independence of the test statistics also ensures that there's no real "pre-test" testing issue to be concerned with here, although in general there'll still be pre-test estimation distortions that may be important if the sample size is relatively small. However, more on pre-testing in some later posts.

Finally, here's a simple example of the general-to-specific testing I've outlined. The time-series data I'm using are stationary, and free of structural breaks - here's a plot of the dependent variable, called LC:
I begin by estimating an AR(4) model, with a drift (intercept) term:

My overall maintained (alternative) hypothesis is that all of the coefficients in the model are non-zero.

First, I test if the coefficient of the AR(4) term is zero. I have a reasonably large sample, so the t-statistics in the above output are asymptotically standard normal, and I cannot reject this first null hypothesis, given the test statistic value of -1.456, if I have in mind a 5% significance level, and if I have no prior information about the sign of the coefficient on AR(4). Notice that the alternative hypothesis at this stage is actually the overall maintained hypothesis.

Next, my null hypothesis will be that the coefficients of both AR(3) and AR(4) are zero, and the alternative will be my previous null hypothesis. This means that the test has to be conducted in the context of a model from which AR(4) has been deleted:

Next, I effectively test if the coefficients of both AR(3) and AR(4) are both zero by testing if the coefficient of AR(3) in this latest model is zero. Again, using a  nominal significance level of 5%  I wouldn't reject this hypothesis.

So, I delete the AR(3) variable from the model:

To test the null that the coefficients of all of the AR(2), AR(3), and AR(4) terms are zero, against the previous null hypothesis as the current alternative hypothesis, I simply test if the AR(2) is significant in the last output. At the 5% level, I reject this null hypothesis, so I stop the testing procedure.

Now the question is, what is the actual significance level being used at this last step, against the overall maintained hypothesis that all four AR terms should be in the model? Well, it's just 1 - (1 - 0.05)3, or 14.26%.

A final comment is in order. Usually, there isn't a unique ordering of the nested null hypotheses. Here, it may have seemed natural enough to follow the order that I used. Often, different orderings may suggest themselves, and you can end up with different final specifications for your model, depending on the nesting order that you follow. In this case you may end up with two or mode "final" competing models that are themselves nested. In this case, the procedures discussed above may applied again. On the other hand these "final" models may be non-nested, in which case you can use one of the usual information criteria (e.g., AIC, SIC) to select a preferred model.


Anderson, T. W., 1971. The Statistical Analysis of Time Series. Wiley, New York.

Basu, D., 1955. On statistics independent of a complete sufficient statistic. Sankhyā, 15, 377-380.

Campos, J. N. R. Ericsson, and D. F. Hendry, 2005.  General-to-specific modelling: An overview and selected bibliography. International Finance Discussion Papers No. 838, Federal Reserve Board, Washington, D.C..

Giles, D. E. A.  and P. Hampton, 1985. An Engel curve analysis of household expenditure in New Zealand. Economic Record, 61, 450-462.

Mizon, G. E., 1977. Model selection procedures. In M. J. Artis and A. R. Nobay (eds.), Studies in Modern Economic Analysis. Blackwell, Oxford, 97-120.

Sargan, J. D., 1975. A suggested technique for computing approximations to Wald criteria with applications to dynamic specifications. Discussion Paper, LSE Econometrics Program.

Seber, G. A. F., 1964, Linear hypotheses and induced tests. Biometrika, 51, 41-47.

© 2014, David E. Giles


  1. I think there must be a typo in paragraph 10:

    "If we can't reject H0, we would keep xK in our model."

    Should be the opposite, I suppose (if we *can* reject H0, we keep xK).

    Great post, as usual!


    1. Owen - thanks for spotting this - now fixed.
      Best, Dave

  2. Nice post! I have only one question: is it safe to say that the series is stationary when one of the root is 0.99? This does not add anything to the topic of the post but I' probably have fitted an ARIMA model.

    1. Yes, its'still stationary. I agree that an ARIMA model would be appropriate, but that's not the point of the post, as you obviously realize.

  3. Very useful post. Two questions: 1. Does this apply to bounds test for cointegration? 2. How would you do this when you have p lags of variable x1 and q lags of variable x2?

    1. Yes, it would apply there is successive t-tests were used. Usually, though, in that context, all possible combinations of the lags on the variables are considered, and then the specification is chosen on the basis of minimum SIC or AIC.