Signal extraction is a common pastime in empirical economics. When we fit a regression model we're extracting a signal about the dependent variable from the data, and separating it from the "noise". When we use the Hodrick-Prescott (HP) filter to extract the trend from a time-series, we're also engaging in signal extraction. In the case of a regression model we wouldn't dream of reporting estimated coefficients without their standard errors; or predictions without confidence bands. Why is it, then, that the trend that's extracted using the HP filter is always reported without any indication of the associated uncertainty?
Actually, I really don't know why this issue is generally ignored in practice - especially as there's an established literature that, at least implicitly shows us how we can report confidence bands for HP-filtered data. The "trick" is to recognize that the HP filter can be re-cast as a regression problem.
I'm going to focus on the HP filter rather than its competitors, largely for expository purposes, but also because (for better or worse) it's probably the most widely used filter of its type.
First, let's recall what the HP filter is all about.
Suppose that we have a time-series, yt , for t = 1, 2, 3, ...., T. We assume that the data can be described as:
yt = τt + ct ,
where τt represents the non-linear trend in the series, and ct is the cyclical component.
Then, the HP filter involves solving the following optimization problem:
min.(τt ) { Σ (yt - τt )2 + λ Σ [(τt+1 - τt ) + (τt - τt-1 ) ]2 } . (1)
for t = 1 to T.
In (1), the first summation is over t = 1 to T; and the second is over t = 2 to T-1. The first term in the objective function can be viewed as measuring "goodness of fit"; while the second term imposes a penalty for "roughness". The smoothing parameter, λ, is chosen by the user and there are well-known rules regarding its choice, depending on the frequency of the data.
Although economists usually attribute this filter to Hodrick and Prescott (1980, 1997), in fact it dates back to Leser (1961), and is based on early contributions by Whittaker (1922) and by Henderson (1924). Beginning with Danthine and Girardin (1989), several authors have noted that this optimization problem can be re-written in the following vector-matrix form:
min.(τ) { (c' c) + λ (Kτ)' (Kτ) } . (2)
Here, K = {kij} is a [(T - 2) x T ] "second-differencing" matrix, with
kij = 1 (if i = j, or j = i + 2); = -2 (if j = i + 1); = 0 (otherwise).
The solution to this problem is:
τ* = [ I + λ K'K] -1 y , (3)
where I is an identity matrix of order T. (In practice, care has to be taken over the inversion of the matrix in (3), as it can be close to being singular.)
When we look at this result, we see immediately that the HP filter can be interpreted as an application of Ridge Regression. Specifically, if we consider the "regression model"
y = Iτ + c , (4)
then the (generalized) ridge estimator of τ is of the general form:
τ** = [ I'I + λ A]-1 I'y = [ I + λ A] -1 y . (5)
Setting A = K'K in (5), we see that τ** = τ*.
Subsequently, Schlicht (2005) extended this analysis to allow for the simultaneous estimation of the smoothing parameter, λ, but this isn't going to be pursued further here. (See the comment from Ekkehart Schlicht that I added at the end of this post, just above the "References", on 28 December, 2020.)
If you've done any Bayesian econometrics, you'll also see right away that when written in the form (3), the HP filter can also be given a Bayesian interpretation, as was noted originally by Ley (2006), and very recently by Polasek (2011), If the cyclical component in (4) is assumed to be normally distributed with a variance of σ2 and we use the natural-conjugate prior for the "parameters", so that p (τ | σ ) ~ N [0 , ( σ2 / λ ) (K'K) -1], then the Bayes estimator of τ is given by (3).
The bottom line from all of this is that the HP filter can be interpreted as an estimator for a particular regression model, and so we can easily construct the covariance matrix for this estimator. From this, we can get confidence intervals for each value in the τ vector - that is, we can get a confidence band for the extracted trend component.
From (3), note that the covariance matrix for the elements of τ* is given by
V(τ*) = [ I + λ K'K] -1 V(y) [ I + λ K'K] -1 . (6)
The form of V(y) will depend on the time-series under analysis, and under suitable assumptions this covariance matrix can be estimated from the data. Then, the square roots of the diagonal elements of the matrix in (6) will provide standard errors and (at least asymptotically) a 95% confidence band series for the HP filter can be constructed as {τ*t - 1.96 s.e.( τ*t ) , τ*t + 1.96 s.e. ( τ*t ) }; t = 1, 2, ..., T.
From (4), note that V(y) = V(c), where c is the vector of observations on the cyclical component of the time-series. So, it would be most unrealistic to assume that V(y) = σ2 I. Instead, an ARIMA model for the series could be identified and estimated, yielding an estimate of the V(y) matrix for substitution into (6).
Now, let's look at an application of some these results by applying the HP filter to some real data, and then reporting a confidence band for the extracted trend component. The data that I'll use can be found on the Data page for this blog. The associated EViews workfile and program file; and gretl data file and script file, are on the Code page.
Here's the time-series I'm going to filter. It's a series of annual data for the growth rate of real Canadian GDP. A more detailed definition is given in the accompanying data file. The usual unit root tests indicate that the series is (trend-) stationary.
Using EViews to apply the HP filter with the value of λ chosen according to the Ravn and Uhlig (2002) criterion for annual data, here is what I get:
Incidentally, if you look in the EViews file, you'll see that the trend and cyclical components generated by the package are identical to the series generated using my program and equation (3) above.
Now, let's consider some 95% confidence bands for the extracted trend component. First, I'm going to apply equation (6) with the unrealistic assumption that V(y) = σ2 I, and I'll estimate σ2 by using the sample variance of the original data, namely s2 = 0.000483. The trend and its confidence bands appear in the next figure:
Now let's try and do a better job with the estimation of the covariance matrix, V(y). Recall that the series we're analyzing is stationary. So, let's do some ARIMA modelling. The correlogram for the GDP growth rate series looks like this:
I'm going to identify an AR(1) model from this, and the estimation results are:
The correlogram for the residuals of this regression, and the Breusch-Godfey LM test for serial independence, suggest that there is no autocorrelation of any order in the residuals. I'm happy to stick with my AR(1) specification for the GDP growth rate series.
[HT to Riccardo (Jack) Luccetti for pointing out a silly error in my earlier EViews program code that gave a somewhat different result, and for supplying gretl script that is now available on the Code page for this blog.]
Notice that the confidence bands based on the AR(1) model are a little less informative than are those based on the assumption that V(y) is scalar. Of course, we can't presume that this relationship between the confidence bands will arise with other time-series or other time-periods.
The take-away message from this post is very simple:
It's straightforward to construct confidence bands for the trend that is extracted from a time-series using the HP filter
I noted at the beginning of the post that the ideas used here are implicit in the existing literature. However, I don't recall having seen anyone take them up in this way, and construct confidence bands. Have you?
Ekkehart Schicht kindly emailed me the following comment, after this blog was closed:
"In my article that you kindly mentioned I gave the variance-covariance matrix of the trend estimate in equation (6.3). This is valid for any smoothing parameter. The square roots of the main diagonal elements give the standard deviations. That the confidence band is broader at the beginning and at the end of the time series is no shortcoming; rather it results from the fact that the HP filter is two-sided while the Kalman filter is one-sided and does not use all information for the intermediate estimates.
Johannes Ludsteck has a Mathematica package for the HP filter available at https://library.wolfram.com/infocenter/MathSource/5161/ "
References
Danthine, J-P. and M. Girardin, 1989. Business cycles in Switzerland. European Economic Review, 33, 31-50.
Henderson, R., 1924. A new method of graduation. Transactions of the Actuarial Society of America, 25, 29-40.
Hodrick, R. J. and E. C. Prescott, 1980. Postwar U.S. business cycles: An empirical investigation. Discussion Paper No. 451, Department of Economics, Carnegie Mellon University.
Hodrick, R. J. and E. C. Prescott, 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit, and Banking, 29, 1-16.
Leser, C. E. V., 1961. A simple method of trend construction. Journal of the Royal Statistical Society, B, 23, 91-107.
Ley, E., 2006. The Hodrick-Prescott filter. Knowledge Brief for Bank Staff, The World Bank, Washington D.C.
Polasek, W, 2011. The Hodrick-Prescott (HP) filter as a Bayesian regression model. WP 11-46, The Rimini Centre for Economic Analysis, Rimini, Italy.
Ravn, M. O. and H. Uhlig (2002). On adjusting the Hodrick-Prescott Filter for the Frequency of Observations. Review of Economics and Statistics, 84, 371-376.
Schlicht, E., 2005. Estimating the smoothing parameter in the so-called Hodrick-Prescott filter. Journal of the Japan Statistical Society, 35, 99-119.
Whittaker, E. T. (1922). On a new method of graduation. Proceedings of the Edinburgh Mathematical Society, 41, 63-75.
Realy nice post Prof. Giles.
ReplyDeleteMy remark is that H-P filter is a two sided filter, so the fit in the upper and lower tails are worse than in the middle of the sample.
I think that this fact is ignored when we compute the confidence interval as you suggest, no?
Another point I have is about the long tradition of choosing the value of the lambda based on the frequency of observations. Harvey and Trimbur (2008) show that it is not the case - that the volatility of the series plays a role as well(http://www.terrapub.co.jp/journals/jjss/pdf/3801/38010041.pdf )
Though I know yo haven't pursuit this point, it might be important to mention it here, in the comments.
By the way, I am a loyal follower of the blog!
Best,
Pedro
Pedro: Thanks for the very thoughtful and relevant comments.
ReplyDeletea great post followed by a great comment
ReplyDeleteAnonymous: Thanks.
ReplyDeleteGreat post, David. A few things come immediately to mind.
ReplyDeleteFirst, the wider confidence intervals at the end of the sample reinforces the well known problems with HP filter at sample end points.
Second, in much of the calibration literature, the HP filter is applied to log levels rather than growth rates and I wonder about the complications this raises since the underlying series is not stationary.
Finally, it is worth thinking about this in the context of the calibration literature. The goal is to match the moments (standard deviations or correlations) of a calibrated model to the de-trended data. With the confidence intervals, though, such an exercise seems much less straightforward.
Graham: Thanks very much for the extremely constructive comments. In order:
ReplyDelete1. I agree, absolutely.
2. Good point. I deliberately chose a stationary series so that I could undertake the ARIMA part of the analysis without further complications. The HP filter itself is known to be OK with data that are integrated to an order no more than the frequency, so that part is O.K. - however, more care would have to be taken over estimating V(y) in the case of non-stationary data.
3. Very interesting comment! Yes, ther would now be an additional level of uncertainty to take into account. One crude option would be to do the matching using the lower confidence band(as if it were the HP filtered data); then repeat the exercise using the upper confidence band in the same way. This would at least allow some sort of sensitivity analysis, I guess.
Thanks again!
Unless I am confused, I think that the K matrix defined after equation (2), like this:
ReplyDeletekij = 1 (if i = j, or j = i + 2); = -2 (if j = i + 1); = 0 (otherwise).
should rather be so:
kij = -1 (if i = j); = 1 (if j = i + 1); = 0 (otherwise).
this is to get a vector containing: [y2-y0; y3-y1; ... ; yT-yT-2] and then get the normo of that with the (Kτ)' (Kτ)
please do correct me if I am wrong.