## Friday, January 6, 2017

### Explaining the Almon Distributed Lag Model

In an earlier post I discussed Shirley Almon's contribution to the estimation of Distributed Lag (DL) models, with her seminal paper in 1965.

That post drew quite a number of email requests for more information about the Almon estimator, and how it fits into the overall scheme of things. In addition, Almon's approach to modelling distributed lags has been used very effectively more recently in the estimation of the so-called MIDAS model. The MIDAS model (developed by Eric Ghysels and his colleagues - e.g., see Ghysels et al., 2004) is designed to handle regression analysis using data with different observation frequencies. The acronym, "MIDAS", stands for "Mixed-Data Sampling". The MIDAS model can be implemented in R, for instance (e.g., see here), as well as in EViews. (I discussed this in this earlier post.)

For these reasons I thought I'd put together this follow-up post by way of an introduction to the Almon DL model, and some of the advantages and pitfalls associated with using it.

Let's take a look.
Suppose that we want to estimate the coefficients of the following DL model:

yt = β0 xt + β1 xt-1 + β2 xt-2 + ........ + βn x t-n + ut          ;         t = 1, 2, ...., T.       (1)

This is called a "finite" DL model if the value of n is finite.

We could add an intercept into the model, and/or add other regressors, but that won't alter the basic ideas in the following discussion. So let's keep the model as simple as possible. We'll presume that the error term, ut, satisfies all of the usual assumptions - but that can be relaxed too.

If the maximum lag length in the model, n, is much less than T, then we could just apply OLS to estimate the regression coefficients. However, even if this is feasible, in the sense that there are positive degrees of freedom, this may not be the smartest way in which to proceed. For most economic time-series, x, the successive lags of the variable are likely to be highly correlated with each other. Inevitably, this will result in quite severe multicollinearity.

How can we deal with this?

In response, Shirley Almon (1965) suggested a pretty neat way of re-formulating the model prior to its estimation. She made use of Weierstrass's Approximation Theorem, which tells us (roughly) that: "Every continuous function defined on a closed interval [a, b] can be uniformly approximated, arbitrarily closely, by a polynomial function of finite degree, P."

Notice that the theorem doesn't tell us what the value of P will be. This presents a type of model-selection problem that we have to solve. The flip-side of this is that if we select a value for P, and get it wrong, then there will be model mis-specification issues that we have to face. In fact, we can re-cast these issues in terms of those associated with the incorrect imposition of linear restrictions on the parameters of our model.

(Almon actually used Lagrangian interpolation in her application of Weierstrass's Theorem to this problem, but there's a simpler (and numerically equivalent) way of describing her idea.)

Let's look into this in model in more detail.

Here's equation (1) again:

yt = β0 xt + β1 xt-1 + β2 xt-2 + ........ + βn x t-n + ut          ;         t = 1, 2, ...., T.       (1)

What we're going to do is to treat the values of the regression coefficients, βi, as unknown functions of "i" That is, we'll set βi = g(i). Then we'll approximate g(i) using a polynomial, f(i), of order P. Typically, P will take a small value, such 2, 3, or 4.

That is, we'll write:

βi = a0 + a1i + a2i2 + .... + aPiP                      ;     i = 1, 2, ....., n                      (2)

If we set P = 3, here's an example of what we're imposing on the problem:

Substituting (2) into (1), we get:
yt = a0 xt + (a0 + a1 + a2 + .... + aP) xt-1 + (a0 + 2a1 + 4a2 + .... + 2PaPxt-2 + ........
+ (a0 + na1 + n2a2 + .... + nPaPx t-n + ut          ;         t = 1, 2, ...., T.                     (3)

Re-arranging the right-hand side of (3), and gathering up terms, we get:

yt = a0 (xt + xt-1  xt-2 + ......+  x t-n) +  a1 (xt-1 + 2xt-2 + .... + nxt-n) + a2 (xt-1+ 4xt-2 + 9xt-3 +......

+ nxt-n) + ......... + aP (xt-1 + 2Pxt-2 + .... + nPxt-n)  + ut     ;         t = 1, 2, ...., T.                                                                                                                                                       (4)
If we've decided on a maximum lag-length (n), and we have chosen a degree (P) for the approximating polynomial, f(.), then we can re-write (4) as:

yt = a0 z0t a1 z1t + a2 z2t + ......... + aP zPt ut     ;         t = 1, 2, ...., T.           (5)
where:

z0t = (xt + xt-1  xt-2 + ......+  x t-n

z1t = (xt-1 + 2xt-2 + .... + nxt-n)

z2t (xt-1 4xt-2 + 9xt-3 +...... + nxt-n)
.
.
.
zPt = (xt-1 + 2Pxt-2 + .... + nPxt-n) .

Notice that if P is much smaller than n in value, then the number of regression coefficients that have to be estimated in (5) is much less than in (1). We have effectively imposed (n - P) exact linear restrictions on the original coefficient vector. We now have a particular application of restricted least squares coming up. If these restrictions are incorrect, then there will be serious implications for the properties of our final estimator. Positively, however,, the z variables are likely to exhibit far less multicollinearity than do the successive lags of x itself in model (1).

For a given n and P, we can construct the z variables, then estimate a0, a1,....., aP by applying OLS to (5), and finally "recover" the estimates for the βi's using (2):
βi*= a0*+ a1*i + a2*i2 + .... + aP*iP                      ;     i = 1, 2, ....., n              (6)

where a * superscript denotes an OLS estimate.

Because the relationship between the βi*'s and the aj*'s in (6) is a linear one, it is trivial to "recover" the standard errors for the former estimates form the covariance matrix associated with the latter estimates.

All of this is described in some detail in an old discussion paper by Smith and Giles (1976), referenced below.

Now let's consider a specific example, to make all of this more "concrete".

An Example

Here's the original model, again:

yt = β0 xt + β1 xt-1 + β2 xt-2 + ........ + βn x t-n + ut          ;         t = 1, 2, ...., T.       (7)

Let's choose P = 2. This is very restrictive indeed, in terms of the "shapes" that the lag distribution can take. However, it will simplify the discussion here. More complex (and realistic) cases are discussed in detail by Smith and Giles (1976).

So, we have:

βi = a0 + a1i + a2i2                      ;     i = 1, 2, ....., n                                              (8)

which implies that

yt = a0 xt + (a0 + a1 + a2) xt-1 + (a0 + 2a1 + 4a2) xt-2 + ........ + (a0 + na+ n2a2) x t-n
+ ut          ;         t = 1, 2, ...., T

or,
yt = a0 (xt + xt-1 + xt-2 + ........ + x t-n) +  a(xt-1 + 2xt-2 + ........ + nx t-n
+  a(xt-1 + 4xt-2 + ........ + n2t-n) + ut          ;         t = 1, 2, ...., T

or,

yt = a0 z0t +  az1t  + az2t + ut          ;         t = 1, 2, ...., T                                  (9)

Here,
z0t = (xt + xt-1 + xt-2 + ........ + x t-n

z1t  = (xt-1 + 2xt-2 + ........ + nt-n)

z2t = (xt-1 + 4xt-2 + ........ + n2t-n)                   ;         t = 1, 2, ...., T

We construct the z variables; estimate the coefficients in (9) by OLS; and then create estimates of the original  βi's using (8). Effectively, we now have (particular) restricted least squares estimates of the original coefficients in (7). Everything that you know about restricted least square applies here!

An Extension

Often, we'll have economic information that will suggest something further about the pattern (lag distribution) that the values of the βi's should follow. For instance, we may know that it makes sense for the lag weights to "die out" to zero when i = n+1. Or we may want the slope of the lag distribution to be zero when i = n. There are lots of such pieces of prior information that we may want to impose on the problem, and some of these are discussed by Smith and Giles (1976), together with graphs and details of the associated formulae.

Of course, these shape restrictions add to those already in play as a result of choosing a value for P. They further extend the chance that we may be imposing false restrictions on the parameter space, and this would lead our OLS estimates to be both biased and inconsistent. So, extreme care should be taken, and there are some important model-selection issues to be taken into account here.

Let's illustrate this by extending the previous example. We'll stick with the choice of P = 2, but we'll add the restriction that the derivative of f(i) should be zero when i = n. This is Case 5 in the paper by Smith and Giles (1976).

Noting that βi f(i) = a0 + a1i + a2i2  , it follows that f '(i) = a1 + 2a2i, and we're going to set

f '(n) = a1 + 2na= 0 ,

implying that  a1 = - 2na2.

Now we can eliminate a1 from the problem (that's another linear restriction that we're imposing, right there).

Looking back at equation (9) we can see that the resulting equation that we'll be estimating is now of the form:

yt = a0 z0t a2 (z2t - 2n z1t) + ut          ;         t = 1, 2, ...., T                                  (10)

Then, the estimates of the original regression coefficients are

βi*a0*+ a1*i + a2*i2                 ;     i = 1, 2, ...., n.

So, what are the take-away messages here? They can be summarized pretty simply:
• The Almon estimator provides a rather neat way of circumventing the multicollinearity problems that would arise if we simply estimated a DL model, with lots of lags, directly by OLS.
• It does this by approximating the "shape" of the distribution of the lag coefficients through time  by a polynomial of order P.
• The value of P has to be chosen by the user, and this leads to a model-selection problem.
• The choice of P also affects the form of certain exact linear restrictions that are effectively being placed on the regression coefficients.
• This leads to the possibility that false restrictions are imposed, and this would lead to the resulting estimator being both biased and inconsistent.
• Additional restrictions can be placed on the lag distribution, based on our knowledge of the underlying economics of the relationship we're estimating.
• Applying such restrictions should also be undertaken with care, again to avoid adversely affecting the properties of our estimator.
• The validity of some of the restrictions can be tested in the usual way. For instance, when we have fixed P and then we add end-point restrictions, the latter restrictions are "nested", so we can use a Wald test (for instance).
It's worth commenting that the choice of P, and some of the potential model-selection and mis-specification issues that can plague the Almon DL estimator can also be resolved in a straightforward manner if one takes a Bayesian approach to the problem. For more details, and an empirical application, see Giles (1977).

References

Almon, S., 1965. The distributed lag between capital appropriations and net expenditures. Econometrica, 33, 178-196.

Ghysels, E., P. Santa-Clara, & R. Valkanov, 2004. The MIDAS touch: Mixed data sampling regression models. Mimeo.

Giles, D.E.A., 1977. Current payments for New Zealand's imports: A Bayesian analysis. Applied Economics, 9, 185-201.

Smith, R.G. & D.E.A. Giles,1976. The Almon estimator: Methodology and users' guide. Discussion Paper E76/3, Economic Department, Reserve Bank of New Zealand.

Weierstrass, K., 1885. Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen Veränderlichen. Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, (II).

© 2017, David E. Giles

#### 4 comments:

1. Thanks sir for such a nice post. But you did't touch the issue of selection of lag length and degree of polynomial. Please address this issue in some post. One other confusion which I need to clarify that whether it is nessesary that y and x variable should be stationary for applying Almon technique.

1. Thanks for the comment. I'll add a couple of comments to the text and say more in a later post. Yes, the data have to be stationary - it's just a regression model with restrictions on the coefficients.

2. I always wondered didn't C. A. Sims prove that Almon approximation should not be used, as it really doesn't approximate? I am refering to the following article: http://www.jstor.org/stable/pdf/2284717.pdf?acceptTC=true&seq=1#page_scan_tab_contents.

In this article C. A. Sims shows that if we want an approximation, our goal is to get as close as possible to $y_t$ and not the actual lag coefficients. Hence we move to a different space and Weierstrass result does not apply. From what I've read I got the impression that nobody expanded on Sims' idea. Can you comment on that?

Thanks by the way for mentioning MIDAS R implementation, I noticed that you did that several times already, so I am rolling all the thanks into one.

1. Thanks for the very useful comment. I'll take a look at Sims's article and comment later.