Tuesday, May 22, 2012

Log Transformations & Forecasting

I enjoyed reading the lead article in the latest issue of Empirical Economics, by Helmut Lütkepohl and Fang Xu. It assesses the quality of forecasts obtained from an ARIMA model that is estimated using the levels of the data in question, as opposed to forecasts that are generated from a model estimated from the logarithms of the data.

Should we use apply a log transform to time-series data before estimating an ARIMA model or not? Some authors resort to this as a way of stabilizing the variance of the series prior to estimation. However, as Tom Reilly pointed out correctly in a comment on a previous post of mine, this is not always a good idea. In addition, there are other reasons for applying a log transform, especially when testing for unit roots and cointegration.

While quite a bit has been written about logs versus levels when estimating an ARIMA model, less has been said about the impact of this choice on out-of-sample predictive performance. That's what the Lütkepohl and Xu paper focuses on.

What do they conclude?

In summary:
  • If the log transformation does indeed stabilize the within-sample variance of the series, this often leads to an improvement in forecasting performance (in terms of MSE) for the levels of the data.
  • If you use a log transformation inappropriately, then this generally has adverse implications for forecast MSE of the levels of the data.

Perhaps not too suprising, but it's a nice paper, with a number of important messages for time-series practitioners.

My own take-away message: be careful with your choice of data transformations and model specifications. Just because other people are making particular choices, this doesn't mean that they are right for your particular context - or even right at all!


Lütkepohl, H. & F. Xu, 2012. The role of the log transformation in forecasting. Empirical Economics, 42, 619-638. (W.P. version available here.)

© 2012, David E. Giles


  1. Dave, in financial applications a lot of thought has gone into this. In particular, I would focus your attention to:

    In particular, I don't use the log to stabilize variance. Indeed, if you take the logs of interest rates that have come near the zero-lower-bound it will NOT be the case that the variance is more stable. The reason to use it is primarily in the case for variables that cannot fall below zero and also that the distribution of log changes is roughly symmetrical, but that is not the case for linear changes. Also, it is convenient to project to the horizon and convert back to linear returns if necessary.

    1. Thanks John - I understand what you're saying.