Econometrics Beat: Dave Giles' Blog: Explaining the Almon Distributed Lag Model

Friday, January 6, 2017

Explaining the Almon Distributed Lag Model

In an earlier pos t I discussed Shirley Almon's contribution to the estimation of Distributed Lag (DL) models, with her seminal paper in 1965.

That post drew quite a number of email requests for more information about the Almon estimator, and how it fits into the overall scheme of things. In addition, Almon's approach to modelling distributed lags has been used very effectively more recently in the estimation of the so-called MIDAS model. The MIDAS model (developed by Eric Ghysels and his colleagues - e.g., see Ghysels et al., 2004) is designed to handle regression analysis using data with different observation frequencies. The acronym, "MIDAS", stands for "Mixed-Data Sampling". The MIDAS model can be implemented in R, for instance (e.g., see here), as well as in EViews. (I discussed this in this earlier post.)

For these reasons I thought I'd put together this follow-up post by way of an introduction to the Almon DL model, and some of the advantages and pitfalls associated with using it.

Let's take a look.
Suppose that we want to estimate the coefficients of the following DL model:

y_t = β₀ x_t + β₁ x_t-1 + β₂ x_t-2 + ........ + β_n x _t-n + u_t ; t = 1, 2, ...., T. (1)

This is called a "finite" DL model if the value of n is finite.

We could add an intercept into the model, and/or add other regressors, but that won't alter the basic ideas in the following discussion. So let's keep the model as simple as possible. We'll presume that the error term, u_t, satisfies all of the usual assumptions - but that can be relaxed too.

If the maximum lag length in the model, n, is much less than T, then we could just apply OLS to estimate the regression coefficients. However, even if this is feasible, in the sense that there are positive degrees of freedom, this may not be the smartest way in which to proceed. For most economic time-series, x, the successive lags of the variable are likely to be highly correlated with each other. Inevitably, this will result in quite severe multicollinearity.

How can we deal with this?

In response, Shirley Almon (1965) suggested a pretty neat way of re-formulating the model prior to its estimation. She made use of Weierstrass's Approximation Theorem, which tells us (roughly) that: "Every continuous function defined on a closed interval [a, b] can be uniformly approximated, arbitrarily closely, by a polynomial function of finite degree, P."

Notice that the theorem doesn't tell us what the value of P will be. This presents a type of model-selection problem that we have to solve. The flip-side of this is that if we select a value for P, and get it wrong, then there will be model mis-specification issues that we have to face. In fact, we can re-cast these issues in terms of those associated with the incorrect imposition of linear restrictions on the parameters of our model.

(Almon actually used Lagrangian interpolation in her application of Weierstrass's Theorem to this problem, but there's a simpler (and numerically equivalent) way of describing her idea.)

Let's look into this in model in more detail.

Here's equation (1) again:

y_t = β₀ x_t + β₁ x_t-1 + β₂ x_t-2 + ........ + β_n x _t-n + u_t ; t = 1, 2, ...., T. (1)

What we're going to do is to treat the values of the regression coefficients, β_i, as unknown functions of "i" That is, we'll set β_i = g(i). Then we'll approximate g(i) using a polynomial, f(i), of order P. Typically, P will take a small value, such 2, 3, or 4.

That is, we'll write:

β_i = a₀ + a₁i + a₂i² + .... + a_Pi^P ; i = 1, 2, ....., n (2)

If we set P = 3, here's an example of what we're imposing on the problem:

Substituting (2) into (1), we get:
y_t = a₀ x_t + (a₀ + a₁ + a₂ + .... + a_P) x_t-1 + (a₀ + 2a₁ + 4a₂ + .... + 2^Pa_P) x_t-2 + ........
+ (a₀ + na₁ + n²a₂ + .... + n^Pa_P) x _t-n + u_t ; t = 1, 2, ...., T. (3)

Re-arranging the right-hand side of (3), and gathering up terms, we get:

y_t = a₀ (x_t + x_t-1+  x_t-2 + ......+  x _t-n) +  a₁ (x_t-1 + 2x_t-2 + .... + nx_t-n) + a₂ (x_t-1+ 4x_t-2 + 9x_t-3+......

+ n²x_t-n) + ......... + a_P(x_t-1 + 2^Px_t-2 + .... + n^Px_t-n)  + u_t ; t = 1, 2, ...., T. (4)
If we've decided on a maximum lag-length (n), and we have chosen a degree (P) for the approximating polynomial, f(.), then we can re-write (4) as:

y_t = a₀ z_0t + a₁ z_1t + a₂ z_2t + ......... + a_Pz_Pt + u_t ; t = 1, 2, ...., T.     (5)
where:

z_0t = (x_t + x_t-1+  x_t-2 + ......+  x _t-n)

z_1t = (x_t-1 + 2x_t-2 + .... + nx_t-n)

z_2t =  (x_t-1+ 4x_t-2 + 9x_t-3+...... + n²x_t-n)
.
.
.
z_Pt = (x_t-1 + 2^Px_t-2 + .... + n^Px_t-n) .

Notice that if P is much smaller than n in value, then the number of regression coefficients that have to be estimated in (5) is much less than in (1). We have effectively imposed (n - P) exact linear restrictions on the original coefficient vector. We now have a particular application of restricted least squares coming up. If these restrictions are incorrect, then there will be serious implications for the properties of our final estimator. Positively, however,, the z variables are likely to exhibit far less multicollinearity than do the successive lags of x itself in model (1).

For a given n and P, we can construct the z variables, then estimate a₀, a₁,....., a_P by applying OLS to (5), and finally "recover" the estimates for the β_i's using (2):
β_i*= a₀*+ a₁*i + a₂*i² + .... + a_P*i^P ; i = 1, 2, ....., n (6)

where a * superscript denotes an OLS estimate.

Because the relationship between the β_i*'s and the aj*'s in (6) is a linear one, it is trivial to "recover" the standard errors for the former estimates form the covariance matrix associated with the latter estimates.

All of this is described in some detail in an old discussion paper by Smith and Giles (1976), referenced below.

Now let's consider a specific example, to make all of this more "concrete".

An Example

Here's the original model, again:

y_t = β₀ x_t + β₁ x_t-1 + β₂ x_t-2 + ........ + β_n x _t-n + u_t ; t = 1, 2, ...., T. (7)

Let's choose P = 2. This is very restrictive indeed, in terms of the "shapes" that the lag distribution can take. However, it will simplify the discussion here. More complex (and realistic) cases are discussed in detail by Smith and Giles (1976).

So, we have:

β_i = a₀ + a₁i + a₂i² ; i = 1, 2, ....., n (8)

which implies that

y_t = a₀ x_t + (a₀ + a₁ + a₂) x_t-1 + (a₀ + 2a₁ + 4a₂) x_t-2 + ........ + (a₀ + na₁+ n²a₂) x _t-n

+ u_t ; t = 1, 2, ...., T

or,

y_t = a₀ (x_t + x_t-1 + x_t-2 + ........ + x _t-n) + a₁(x_t-1 + 2x_t-2 + ........ + nx _t-n)

+ a₂(x_t-1 + 4x_t-2 + ........ + n²x _t-n) + u_t ; t = 1, 2, ...., T

or,

y_t = a₀ z_0t + a₁z_1t + a₂z_2t+ u_t ; t = 1, 2, ...., T (9)

Here,

z_0t= (x_t + x_t-1 + x_t-2 + ........ + x _t-n)

z_1t = (x_t-1 + 2x_t-2 + ........ + nx _t-n)

z_2t= (x_t-1 + 4x_t-2 + ........ + n²x _t-n) ; t = 1, 2, ...., T

We construct the z variables; estimate the coefficients in (9) by OLS; and then create estimates of the original β_i's using (8). Effectively, we now have (particular) restricted least squares estimates of the original coefficients in (7). Everything that you know about restricted least square applies here!

An Extension

Often, we'll have economic information that will suggest something further about the pattern (lag distribution) that the values of the β_i's should follow. For instance, we may know that it makes sense for the lag weights to "die out" to zero when i = n+1. Or we may want the slope of the lag distribution to be zero when i = n. There are lots of such pieces of prior information that we may want to impose on the problem, and some of these are discussed by Smith and Giles (1976), together with graphs and details of the associated formulae.

Of course, these shape restrictions add to those already in play as a result of choosing a value for P. They further extend the chance that we may be imposing false restrictions on the parameter space, and this would lead our OLS estimates to be both biased and inconsistent. So, extreme care should be taken, and there are some important model-selection issues to be taken into account here.

Let's illustrate this by extending the previous example. We'll stick with the choice of P = 2, but we'll add the restriction that the derivative of f(i) should be zero when i = n. This is Case 5 in the paper by Smith and Giles (1976).

Noting that β_i = f(i) = a₀ + a₁i + a₂i² , it follows that f '(i) = a₁ + 2a₂i, and we're going to set

f '(n) = a₁ + 2na₂= 0 ,

implying that a₁ = - 2na₂.

Now we can eliminate a₁ from the problem (that's another linear restriction that we're imposing, right there).

Looking back at equation (9) we can see that the resulting equation that we'll be estimating is now of the form:

y_t = a₀ z_0t + a₂(z_2t- 2n z_1t) + u_t ; t = 1, 2, ...., T (10)

Then, the estimates of the original regression coefficients are

β_i*= a₀*+ a₁*i + a₂*i²; i = 1, 2, ...., n.

So, what are the take-away messages here? They can be summarized pretty simply:

The Almon estimator provides a rather neat way of circumventing the multicollinearity problems that would arise if we simply estimated a DL model, with lots of lags, directly by OLS.
It does this by approximating the "shape" of the distribution of the lag coefficients through time by a polynomial of order P.
The value of P has to be chosen by the user, and this leads to a model-selection problem.
The choice of P also affects the form of certain exact linear restrictions that are effectively being placed on the regression coefficients.
This leads to the possibility that false restrictions are imposed, and this would lead to the resulting estimator being both biased and inconsistent.
Additional restrictions can be placed on the lag distribution, based on our knowledge of the underlying economics of the relationship we're estimating.
Applying such restrictions should also be undertaken with care, again to avoid adversely affecting the properties of our estimator.
The validity of some of the restrictions can be tested in the usual way. For instance, when we have fixed P and then we add end-point restrictions, the latter restrictions are "nested", so we can use a Wald test (for instance).

It's worth commenting that the choice of P, and some of the potential model-selection and mis-specification issues that can plague the Almon DL estimator can also be resolved in a straightforward manner if one takes a Bayesian approach to the problem. For more details, and an empirical application, see Giles (1977).

References

Almon, S., 1965. The distributed lag between capital appropriations and net expenditures. Econometrica, 33, 178-196.

Ghysels, E., P. Santa-Clara, & R. Valkanov, 2004. The MIDAS touch: Mixed data sampling regression models. Mimeo.

Giles, D.E.A., 1977. Current payments for New Zealand's imports: A Bayesian analysis. Applied Economics, 9, 185-201.

Smith, R.G. & D.E.A. Giles,1976. The Almon estimator: Methodology and users' guide. Discussion Paper E76/3, Economic Department, Reserve Bank of New Zealand.

Weierstrass, K., 1885. Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen Veränderlichen. Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften zu Berlin, (II).

13 comments:

MajidJanuary 7, 2017 at 3:09 AM
Thanks sir for such a nice post. But you did't touch the issue of selection of lag length and degree of polynomial. Please address this issue in some post. One other confusion which I need to clarify that whether it is nessesary that y and x variable should be stationary for applying Almon technique.
ReplyDelete
Replies
mpiktasJanuary 9, 2017 at 3:07 AM
I always wondered didn't C. A. Sims prove that Almon approximation should not be used, as it really doesn't approximate? I am refering to the following article: http://www.jstor.org/stable/pdf/2284717.pdf?acceptTC=true&seq=1#page_scan_tab_contents.

In this article C. A. Sims shows that if we want an approximation, our goal is to get as close as possible to $y_t$ and not the actual lag coefficients. Hence we move to a different space and Weierstrass result does not apply. From what I've read I got the impression that nobody expanded on Sims' idea. Can you comment on that?

Thanks by the way for mentioning MIDAS R implementation, I noticed that you did that several times already, so I am rolling all the thanks into one.
ReplyDelete
Replies
UnknownApril 25, 2017 at 9:15 AM
Thanks for the post. I was curious to know how to find the most important predictor variables for distributed lag models. Is it similar to how we do it with normal regression using p values and stuff?
ReplyDelete
Replies
UnknownSeptember 4, 2018 at 10:26 AM
Thank you for this useful post. I would like to check out Smith and Giles (1976), but I cannot find it online. Would it be possible for you to link to it (or send it to me)? My email is snyde138@msu.edu --Thank you!
ReplyDelete
Replies
UnknownNovember 21, 2018 at 3:57 AM
Hi David,
Thanks for this awesome post. While looking for R packages that estimate these models, I came across package called dLagM. This package has a function for estimating polynomial distributed lag (PDL) model -- polyDLM. However, this function only accepts a single x driver.
I am not sure why the author could not extend the idea to accepting multiple x drivers. Are you aware of other packages in R that do this estimation?
ReplyDelete
Replies
UnknownJuly 25, 2019 at 2:11 PM
Hi, Thanks for the wonderful article.
I am working on gretl and using lagReg package and using "pdl" function of the package. Please help me how to specify the Matrix of PDL specifications. Lag order= 12, Degree of polynomial = 2
Please help me how to define the matrix.

Thanks in Advance
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Friday, January 6, 2017

Explaining the Almon Distributed Lag Model

13 comments: