Friday, July 6, 2018

Interpreting Dummy Variable Coefficients After Non-Linear Transformations

Dummy variables - ones that take only the values zero and one - are commonly used as regressors in regression models. I've devoted several posts to discussing various aspects of such variables, notably here, but also here, here, and here.

When the regression model in question is linear, in both the variables and the parameters, the interpretation of coefficient of such a dummy variable is simple. Suppose that the model takes the form:

    yi = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                          (1)

The range of summation in the term on the right-hand side of (1) is from 1 to k, if there are k regressors in addition to the dummy variable, D. (There is no loss of generality in assuming a single dummy regressor in what follows, and no further distributional assumptions about the error term will be needed or used.)

As you'll know, if Di = 0, then the intercept coefficient in (1) is just α; and it shifts to (α + β) if Di = 1. It changes by an amount equal to β, and so does the predicted mean value of y. Conversely, this amount changes by -β  if Di changes from 1 to 0. Estimating (1) by OLS will give us an estimate of the effect on y of Di sw from 0 to 1 in value, or vice versa.

But a bit more on estimation issues below!


Another way of interpreting what is going on is to think about the growth rate in the expected value of y that is implied when D changes its value. Setting Di = 0, and then Di = 1, this growth rate is:

   g01i = [ (α + β + Σj γj Xji) - (α Σj γj Xji)] / (α Σj γj Xji) = [β /  (α Σj γj Xji)] ,

which you can multiply by 100 to convert it into a percentage rate of growth, if you wish. 

Note that this growth rate depends on the other parameters in the model, and also on the sample values for the other regressors

Conversely, when D changes in value from 1 to 0, this growth rate is different, namely:

   g10i = - [β / (α + β + Σj γj Xji)]                            (i = 1, ...., n).

In this fully linear model these growth rates offer a somewhat less appealing way of summarizing what is going on than does the amount of change in the expected value of y. The latter doesn't depend on the other parameters of the model, or on the sample values of the regressors.

However, this situation can change very quickly once we move to a regression model that is non-linear, either in the variables or in the parameters (or both). 

That's what I want to focus on in this post. 

Let's consider some interesting examples that involve common transformations of the dependent variable in a regression model. Apart from anything else, such transformations are often undertaken to make the assumption of a normally distributed error term more reasonable.