Thursday, December 4, 2014

More on Prediction From Log-Linear Regressions

My therapy sessions are actually going quite well. I'm down to just one meeting with Jane a week, now. Yes, there are still far too many log-linear regressions being bandied around, but I'm learning to cope with it!

Last year, in an attempt to be helpful to those poor souls I had a post about forecasting from models with a log-transformed dependent variable. I felt decidedly better after that, so I thought I follow up with another good deed.

Let's see if it helps some more:

An important issue that arises very frequently is how to use a log-model to predict the levels of the dependent variable. More generally, the dependent variable may be of the form h(y), where h(.) is some function, but we want to use the estimated model to predict values of y itself - not h(y).

In the log-model case, lots of people just get the predictions of log(y) and then take the exponential of these predicted values. Unfortunately, that's not really the correct thing to do, and it can result in substantial distortions in your forecasts. That's what the previous post was about.

To re-cap, let's suppose that we're using a regression model of the form:

              log(yt) = β1 + β2x2t + β3x3t + ......+ βkxkt + εt    ;  t = 1, 2, ..., n                           (1)

where log(.) denotes the natural logarithm; the regressors are non-random; and the errors are assumed to be i.i.d. N[0 , σ2]. The assumption of normal errors is key.

Having estimated the model by (say) OLS, we have the "fitted" (predicted) values for the dependent variable. That is, we have values of [log(yt)]* = b1 + b2x2t + ..... + bkxkt, where bi is the OLS estimator for βi (i = 1, 2, ......, k).

Usually, of course, we'd be more interested in fitted values that are expressed in terms of the original data - that is, in terms of y itself, rather than log(y). As I discussed in the previous post, when we generate our predictions ("fitted values") of yt, based on our log-linear model, really we should create them as:

            yt* = exp{[log(yt)]* + ( s2 / 2)},                                                                           (2)

            [log(yt)]* = [ b0 + b1x1t + ... + bkxkt ], 

and s2 is the usual unbiased estimator of σ2, based on the OLS estimates of the semi-log model.

That is,

             s2 = Σ[log(yt) - b1 - b2x2t - ....... - bkxkt]2 / (n - k),

where the range of summation is for t = 1, 2, ...., n.

The extra term that enters the expression for the forecasts in (2) arises because we're assuming that the errors in the log-model are normally distributed. Specifically, it comes from the relationship between a normal, and a log-normal distribution. There's a big literature surrounding this point - see teh references listed below.

However, what if we can't reasonably assume that the errors in (1) are normally distributed. In this case, (2) is not really applicable, and its use may actually do more harm than good. What can we do in this situation?

Duan (1983) suggests an interesting approach - a non-parametric one, so that we don't need to make any particular assumption about the distribution of the regression errors. He introduces a "smearing estimate" of the reverse transformation that's needed to extract a prediction of the original variable, after a transformation has been used for estimation purposes.

In the case where the transformation of the dependent variables is a logarithmic one, Duan's smearing estimate yields the following prediction:

                 yt** = exp{[log(yt)]}* . (1 / n)Σ(exp(et))  ,                                               (3)

where et = [log(yt) - b1 - b2x2t - ....... - bkxkt] is the tth residual when (1) is estimated by OLS.

The  yt**'s provide an alternative to the yt*'s, from (2).

Duan notes that the transformations in both (2) and (3) yield consistent forecasts if the regressions actually are normal. However, in this case the forecasts based on (2) will be more efficient. He investigates the loss in forecast efficiency that can arise when (3) is used in this case. If σ2 ≤ 1 the loss in predictive efficiency is actually very low.

Let's look at some examples to illustrate Duan's transformation, and to compare its performance with the usual one, in (2). In each case, the data I've used are available on the data page for this blog; and the EViews workfiles with my results are on the code page.

Let's begin with an example in which there is evidence that the model's errors are actually non-normal, so really equation (2) isn't appropriate. I have a model that explains the logarithm of women's wages as a function of the logarithm of women's years of education (WE), and the number of children they have between the ages of 6 and 18 years (K618):

                            log(WWi) = β1 + β2log(WEi) + β3K618i + εi  .

The model is estimated by OLS, using observations 100 to 150 in my cross-section:

The Jarque-Bera test for normality yields a test statistic of JB = 51.1 (p = 0.0000).

I've then generated (ex post) forecasts for observations 151 to 158. The true values of WW are given below, together with the various forecasts. The suffix "F" indicates that I've created a naive forecast - I've just taken the exponential of the forecast of log(WW). The suffix "FN" indicates that normal errors have been assumed, and equation (2) has been used to get the forecasts of WW. Finally, the suffix "FS" indicates that Duan's "smearing" method has been used, and the forecasts of WW come from applying the formula in (3).

Here's plot of the same data:
The Root Mean Square Errors (RMSE's) of the forecasts are 1.53, 1.82, and 1.60 for WWF, WWFN, and WWFS respectively. So, in this case, the naive transformation is the winner, but Duan's transformation beats the one that assumes normal errors.

My second example also involves cross-section data. There are 1388 observations, 1378 of which are used for estimation purposes, and 10 are retained for ex post forecasting. 

The model explains the logarithm of the birth-weight of children, and here are the OLS results:

In this example, although the errors appear to be non-normal (JB = 5387.6; p = 0.0000), the sample is relatively large and there is very little difference between the results obtained with transformations (2) and (3). The suffixes, "F", "FN", and "FS" are used in the same way as in the first example above:

The forecast RMSE's are  for "F", "FN", and "FS" are 17.95, 18.12, 18.10  respectively. So, the naive transformation again produces slightly better forecasts than either of the other two transformations over this forecast horizon. Duan's transformation marginally out-performs the transformation given in (2).

The third example involves a time-series application based on annual data. It's a demand for money equation, with M1 explained in terms of GNP and an interest rate, R.

Some preliminary testing shows that each series is I(1), and there are two cointegrating vectors. So, I'm just going to estimate an equation in the log-levels of the data, and interpret it as a long-run relationship. (None of this is terribly important to the main point of this post.)

The model that I've estimated is of the form:

                         log(M1t) = β1 + β2log(GNPt-1) + β3log(Rt) + εt .

The lag on the GNP variable ensures that the residuals are reasonably free of serial correlation.

Here are the basic results:

The Jarque-Bera test statistic is JB = 2.42 (p = 0.3), so I'm going to conclude that it's reasonable to assume that the model's errors are normally distributed. (Yes, I know this test has only asymptotic validity, and I have a pretty small sample - see here.) This means that equation (2) should be appropriate, although there's nothing wrong in using equation (3).

Having estimated the model over the period 1961 to 1980, let's now forecast M1 over the period 1981 to 1983. I have the actual M1 data for this out-of-sample period. The forecast suffixes, "F", "FN", and "FS" are used in the same way as in the first two example above.

As you can see, in this particular case there's essentially no difference between the results based on any of the three transformations, and in two of the three forecast periods the naive transformation actually out-performs the other two.

Now let's consider a final example. This involves modelling library subscriptions for economics journals. The cross-section data come from Bergstrom (2001). The variables I'll use are:

SUBS Number of library subscriptions
PRICE Library subscription price
PAGES Number of pages in the 2000 volume
CITES Number of citations of the journal by authors.

The model that I've estimated is of the form:

                         log(SUBSi) = β1 + β2log(PRICEi / CITESi) + β3log(PAGESi) + εt .

The estimation results, based on the first 170 observations are:

Applying the Jarque-Bera test to the OLS residuals yields a test statistic of JB = 2.47 (p = 0.3).

When the model is then used to forecast  observations 171 to 180, here's what we get. The suffixes, "F", "FN", and "FS" are used in the same way as in the first three example above.

In this example, the forecast RMSE's are 544.8, 873.5, and 199.6 for SUBSF, SUBSFN, and SUBSFS respectively. Interestingly, Duan's transformation is the clear winner in this case - even though the JB test suggests that the regresion errors are actually normally distributed!

It's important to keep in mind that the usual "adjustment" that's made when transforming logs-forecasts to levels-forecasts is based on the assumption that the errors in the log-model are normally distributed. If, in fact, this is not the case, then these adjustments won't necessarily be appropriate. Whether or not this will adversely affect forecast accuracy is somewhat data-dependent. 

It's also very important to keep in mind that these examples that I've given are merely illustrative of some of the situations that can arise in practice. 

Proceed with caution! And give serious consideration to Duan's "smearing transformation" if there is evidence that the regression errors may be non-normal - especially if the size of your (estimation) sample is modest.


Bergstrom, T. C., 2001. Free labour for costly journals? Journal of Economic Perspectives, 15, 183-198.

Bradu, D. and Y. Mundlak, 1970. Estimation in lognormal linear models. Journal of the American Statistical Association, 65, 198-211.

Duan, N., 1983. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association, 78, 605-610.

Ebbeler, D. H., 1973. A note on large-sample approximation in lognormal linear models.  Journal of the American Statistical Association, 68, 231.

Evans, I. G. and S. A. Shaban, 1974. A note on estimation in lognormal models. Journal of the American Statistical Association, 69, 779-781.

Mehran, F., 1973. Variance of the MVUE for the lognormal mean. Journal of the American Statistical Association, 68, 726-727.

Meulenberg, M. T. G., 1965. On the estimation of  an exponential function. Econometrica, 33, 863-868.

Neyman, J. and E. Scott, 1960. Correction for bias introduced by a transformation of variables. Annals of Mathematical Statistics, 31,  643-655.

Shimizu, K. and K. Iwase, 1981. Uniformly minimum variance unbiased estimation in lognormal and related distributions. Communications in Statistics, A, 10, 1127-1147.

© 2014, David E. Giles


  1. Great post! It seems that the subtleties of back-transformation may easily go unnoticed; at least I did not think about them the first time I had to back-transform. Fortunately, there are posts like this where the small tricky parts face the daylight. One comment: even though your examples are just examples, I would still prefer having a little longer "out of sample" parts. Eight or ten observations seem quite few to draw conclusions from, especially when performances of more than two alternatives are to be compared and ranked.

    1. Thanks for the comment. Keep in mind that this is just illustrative - not a comprehensive study.

    2. In a related article BÅRDSEN, G. & LÜTKEPOHL, H. 2011. Forecasting levels of log variables in vector autoregressions. International Journal of Forecasting, 27, 1108-1115, the authors claim that "despite its theoretical advantages, the optimal forecast is shown to be inferior to the naive forecast if specification and estimation uncertainty are taken into account. Hence, in practice, using the exponential of the log forecast is preferable to using the optimal forecast." An interesting result! (Log-normality is assumed and smearing estimate is not considered.)

  2. My comments seem to be getting eaten on Chrome, so apologize if this turns out to be a duplicate.

    I have two questions. It is my understanding is these sorts of transformations improve the mean prediction, but does not ensure that predictions for individual cases are particularly good. If so, why not use a glm or het-robust poisson regression?

    If you assume group-wise heteroskedasticity, you can smear by group, which relaxes the identically distributed errors assumption to a degree. What do you think of this practice?

    1. Thank you for this post! However I believe there is a small (but important!) typo in equation (3). It currently reads:

      yt** = exp{[log(yt)]* + (1 / n)Σ(exp(et))}

      Based on the Duan paper, I believe it should read,

      yt** = exp{[log(yt)]*}*(1 / n)Σ(exp(et)),

      or equivalently,

      yt** = exp{[log(yt)]* + log((1 / n)Σ(exp(et)))}

      As it is currently written, it is similar to adding 1 to log(yt) before exponentiating it, effectively doubling our predicted yt (roughly speaking). This is because the et's will on average be close to zero, so the average exp(et) will be something similar to 1. This is clearly not what we want to do.

    2. Brian - thanks very much. Fixed. DG