Sunday, July 1, 2018

Dummy Variables in a Semilogarithmic Regression: Exact Distributional Results

For better or worse, semilogarithmic regression models are used a lot in empirical economics. 

It would be nice to think that this is because the researcher found that a logarithmic transformation of the model's dependent variable led to residuals that were more "normally" distributed than without the transformation. Unfortunately, however, it's often just "for convenience". With this transformation, the estimates of the regression coefficients have a simple interpretation, as explained below

I hate it when the latter situation arises. I've long since lost track of the number of times I've been at a seminar where the speaker has used this "simple interpretation" as an excuse for their choice of a semilogarithmic regression specification. For goodness sake, the choice of the model's functional form should be based on more than "convenience"!

For some of my previous comments about this point, see this post.

Most of you will know that when our semilogarithmic model includes a dummy (zero-one) regressor, we have to be careful about how we interpret that regressor's estimated coefficient. Suppose that we have the following regression model, where D is a dummy variable, and the X's are regresssors that are measured "continuously"

   ln(yi) = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                         

Note that there's no loss of generality here in having just one dummy variable in the model.

Then, the interpretation of the regression coefficients is:
  1. A one-unit change in Xj leads to a proportional change of  γj (or a percentage change of 100γj) in y.
  2. When the dummy variable changes from D = 0 to D = 1, the proportional change in y is [exp(β) -1]. Conversely, going from D = 1 to D = 0 implies a proportional change in y of  [exp(-β) -1]. Again, multiply by 100 to get a percentage change.
See Halvorsen and Palmquist (1980) for an explanation of the second of these results, and my comments in this earlier post.

Kennedy (1981) and Giles (1982) discuss the issue of estimating this proportional change in the case of the dummy variable. Their results relate to point estimation - with a focus on unbiased estimation of the proportional change, when the model's errors are normally distributed..

But what about interval estimation of this effect?