Sunday, July 1, 2018

Dummy Variables in a Semilogarithmic Regression: Exact Distributional Results

For better or worse, semilogarithmic regression models are used a lot in empirical economics. 

It would be nice to think that this is because the researcher found that a logarithmic transformation of the model's dependent variable led to residuals that were more "normally" distributed than without the transformation. Unfortunately, however, it's often just "for convenience". With this transformation, the estimates of the regression coefficients have a simple interpretation, as explained below

I hate it when the latter situation arises. I've long since lost track of the number of times I've been at a seminar where the speaker has used this "simple interpretation" as an excuse for their choice of a semilogarithmic regression specification. For goodness sake, the choice of the model's functional form should be based on more than "convenience"!

For some of my previous comments about this point, see this post.

Most of you will know that when our semilogarithmic model includes a dummy (zero-one) regressor, we have to be careful about how we interpret that regressor's estimated coefficient. Suppose that we have the following regression model, where D is a dummy variable, and the X's are regresssors that are measured "continuously"

   ln(yi) = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                         

Note that there's no loss of generality here in having just one dummy variable in the model.

Then, the interpretation of the regression coefficients is:
  1. A one-unit change in Xj leads to a proportional change of  γj (or a percentage change of 100γj) in y.
  2. When the dummy variable changes from D = 0 to D = 1, the proportional change in y is [exp(β) -1]. Conversely, going from D = 1 to D = 0 implies a proportional change in y of  [exp(-β) -1]. Again, multiply by 100 to get a percentage change.
See Halvorsen and Palmquist (1980) for an explanation of the second of these results, and my comments in this earlier post.

Kennedy (1981) and Giles (1982) discuss the issue of estimating this proportional change in the case of the dummy variable. Their results relate to point estimation - with a focus on unbiased estimation of the proportional change, when the model's errors are normally distributed..

But what about interval estimation of this effect? 
For that we need three things: a point estimate; a "standard error"; and (very importantly) the sampling distribution of the point estimator. Van Garderen and Shah (2002) derive various results that provide us with the required standard error. Bryant and Wilhite (1989) and Derrick (1984) provide some related results. But that still leaves us without knowledge of the full sampling distribution.

And there's the difficulty, because [exp(β) -1] is a non-linear function of β. If the regression errors are normally distributed and we use OLS estimation, then the point estimator of β will also be normally distributed. However, the point estimator of [exp(β) -1] won't have any standard distribution!

This is something that I discussed in a paper a few years ago (Giles, 2011).

I derived the exact distribution of  Kennedy's almost unbiased estimator of the proportional change, [exp(β) -1]. You'll see from Theorem 1 of my paper that this distribution is really messy and the density is positively skewed in finite samples. Even with quite large samples it can be quite non-normal. I also showed that constructing bootstrap confidence intervals can be very effective in practice.

The distributional results provided in that 2011 paper could also be used if you want to test hypotheses about the proportional change implied by a dummy variable's estimated coefficient. However, that's not something that I discussed explicitly.

So, be very careful indeed when you are estimating and interpreting the effects of dummy variables in semi-logarithmic regressions - especially if you're interested in anything more than basic point estimation.

(And yes, I'd better do something about getting that 2011 working paper published!)


Bryant, R. and A. Wilhite, 1989. Additional interpretations of dummy variables in semilogarithmic equations. Atlantic Economic Journal, 17, 87-88.

Derrick, F. W., 1984. Interpretation of dummy variables in semilogarithmic equations: Small sample implications. Southern Economic Journal, 50, 1185-1188.

Giles, D. E., 1982. The interpretation of dummy variables in semilogarithmic equations: Unbiased estimation. Economics Letters, 10, 77-79.

Giles, D. E., 2011. Interpreting dummy variables in semi-logarithmic regression models: Exact distributional results. Econometrics Working Paper EWP1101, Department of Economics, University of Victoria.

Kennedy, P. E., 1981. Estimation with correctly interpreted dummy variables in semilogarithmic equations.  American Economic Review, 71, 801.

Halvorsen, R. and R. Palmquist, 1980. The interpretation of dummy variables in semilogarithmic equations. American Economic Review, 70, 474–475.

Van Garderen, K. J. and C. Shah, 2002. Exact interpretation of dummy variables in  semilogarithmic equations. Econometrics Journal, 5, 149-159. 

© 2018, David E. Giles

1 comment: