Econometrics Beat: Dave Giles' Blog: Interpreting Dummy Variable Coefficients After Non-Linear Transformations

Friday, July 6, 2018

Interpreting Dummy Variable Coefficients After Non-Linear Transformations

Dummy variables - ones that take only the values zero and one - are commonly used as regressors in regression models. I've devoted several posts to discussing various aspects of such variables, notably here, but also here, here, and here.

When the regression model in question is linear, in both the variables and the parameters, the interpretation of coefficient of such a dummy variable is simple. Suppose that the model takes the form:

y_i = α + β D_i + Σ_j γ_j X_ji + ε_i; E(ε_i) = 0 ; i = 1, ...., n. (1)

The range of summation in the term on the right-hand side of (1) is from 1 to k, if there are k regressors in addition to the dummy variable, D. (There is no loss of generality in assuming a single dummy regressor in what follows, and no further distributional assumptions about the error term will be needed or used.)

As you'll know, if D_i = 0, then the intercept coefficient in (1) is just α; and it shifts to (α + β) if D_i = 1. It changes by an amount equal to β, and so does the predicted mean value of y. Conversely, this amount changes by -β if D_i changes from 1 to 0. Estimating (1) by OLS will give us an estimate of the effect on y of D_i sw from 0 to 1 in value, or vice versa.

But a bit more on estimation issues below!

Another way of interpreting what is going on is to think about the growth rate in the expected value of y that is implied when D changes its value. Setting D_i = 0, and then D_i = 1, this growth rate is:

g_01i = [ (α + β + Σ_j γ_j X_ji) - (α + Σ_j γ_j X_ji)] / (α + Σ_j γ_j X_ji) = [β / (α + Σ_j γ_j X_ji)] ,

which you can multiply by 100 to convert it into a percentage rate of growth, if you wish.

Note that this growth rate depends on the other parameters in the model, and also on the sample values for the other regressors.

Conversely, when D changes in value from 1 to 0, this growth rate is different, namely:

g_10i = - [β / (α + β + Σ_j γ_j X_ji)] (i = 1, ...., n).

In this fully linear model these growth rates offer a somewhat less appealing way of summarizing what is going on than does the amount of change in the expected value of y. The latter doesn't depend on the other parameters of the model, or on the sample values of the regressors.

However, this situation can change very quickly once we move to a regression model that is non-linear, either in the variables or in the parameters (or both).

That's what I want to focus on in this post.

Let's consider some interesting examples that involve common transformations of the dependent variable in a regression model. Apart from anything else, such transformations are often undertaken to make the assumption of a normally distributed error term more reasonable.

Transforming the Dependent Variable

The following discussion is quite extensive. I've deliberately worked through the various examples in some detail, so that you can exactly where the various results come from. However, I'll be uploading a a separate, brief, "Summary of Results" here, shortly.

1. Logarithmic Dependent Variable

First, suppose that the model, while still linear in the parameters, is non-linear in y, through a (natural) logarithmic transformation -

log(y_i) = α + β D_i + Σ_j γ_j X_ji + ε_i; E(ε_i) = 0 ; i = 1, ...., n. ; if y_i > 0
(2)

While β still represents the shift in the intercept of the model, its value is no longer the shift in the expected value of y. However, the growth rate in the expected value of y now has a very simple expression, as was discussed by Halvorsen and Palmquist (1980), for example.

From (2), we can write:

y_i= exp[α + β D_i + Σ_j γ_j X_ji + ε_i] . (3)

Then, using the same notation as above, after setting D_i = 0 and D_i = 1, in (3) and setting that ε_i to its mean value off zero, we immediately have:

g₀₁ = exp(β) - 1

and

g₁₀ = exp(-β) - 1 .

The "i" subscript on the left-hand side of these last two expressions has been suppressed, as these rates are independent of the sample values. In addition, they depend only on the value of the coefficient of the dummy variable, and not on the other coefficients in the model. That's very convenient, but this alone shouldn't be used to justify using this log-linear model. (See my earlier posts, here and here.)

Also, as we'll see below, this is a rather unusual situation, and we should be careful not to presume that this applies when we use other non-linear transformations of the model's dependent variable.

2. Box-Cox Transformation

Next, suppose we have a model in which the basic Box and Cox (1964) transformation has been applied to the dependent variable:

y_i* = [(y_i^λ - 1) / λ] ; if λ ≠ 0

y_i* = log(y_i) ; if λ = 0 .

Clearly, the case where λ = 0 has been dealt with above, so let's focus on λ ≠ 0.

The regression model that we'll consider is:

y_i* = α + β D_i + Σ_j γ_j X_ji + ε_i; E(ε_i) = 0 ; i = 1, ...., n. (4)

This model is non-linear in the parameter λ, and the interpretation of the impact of the dummy variable on y is not straightforward.

Note that we can re-write (4) as:

y_i = [1 + λ(α + β D_i + Σ_j γ_j X_ji + ε_i)] ^1/λ. (5)

Once again, using the same notation as above, after assigning D_i = 0 and D_i = 1 in (5), and setting the error term to (its mean value of) zero, we immediately have:

g_01i = {[1 + λ(α + β + Σ_j γ_j X_ji)] / [1 + λ(α + Σ_j γ_j X_ji)] }^1/λ - 1

and

g_10i = {[1 + λ(α + Σ_j γ_j X_ji)] / [1 + λ(α + β + Σ_j γ_j X_ji)]}^1/λ - 1 ; i = 1, 2, ...., n.
(6)

As in the case of the fully linear model, these rates of growth associated with the dummy variable are both data-dependent, and they depend on all of the model's parameters.

3. Box-Cox Transformation With a Location Shift

A major disadvantage of the basic Box-Cox transformation is that it can't be applied to data that take negative values. One way of adapting the Box-Cox transformation to take this into account is to introduce a second parameter - a location parameter that effectively allows us to "shift" the origin of the data.

With this modification, we have:

(i) y_i* = [((y_i + λ₂) ^λ₁ - 1) / λ₁] ; if λ₁ ≠ 0
(ii) y_i* = log(y_i+ λ₂) ; if λ₁ = 0, and y_i > - λ₂ .

Again, our regression model is:

  y_i* = α + β D_i + Σ_j γ_j X_ji + ε_i; E(ε_i) = 0 ; i = 1, ...., n. (7)

The model is non-linear in the two parameters λ₁ and λ₂, and again the interpretation of the impact of the dummy variable on the expected value of y isn't trivial.

Substituting for y_i*in (7), setting the error to its mean value of zero, and solving for y_i, we can re-write the model as:

(i)   y_i = [1 + λ₁(α + β + Σ_j γ_j X_ji)]^1/^λ₁ - λ₂   ; if  λ₁ ≠ 0

(ii)   y_i = exp[α + β D_i + Σ_j γ_j X_ji] - λ₂ ; if  λ₁ = 0, and  y_i > - λ₂ . (8)

The expressions for the growth rates associated with "switching" the dummy variable's values are:

(i)     g_01i = {[1 + λ(α + β + Σ_j γ_j X_ji) -  λ₂] / [1 + λ(α + Σ_j γ_j X_ji) -  λ₂] }^1/^λ₁  - 1

g_10i = {[1 + λ(α + Σ_j γ_j X_ji) - λ₂] / [1 + λ(α + β + Σ_j γ_j X_ji) - λ₂]}^1/^λ₁  - 1
; if  λ₁ ≠ 0
(ii)    g_01i = {[exp(α + β + Σ_j γ_j X_ji) - λ₂] / [exp(α + Σ_j γ_j X_ji) - λ₂]} - 1

g_10i = {[exp(α + Σ_j γ_j X_ji) - λ₂] / [exp(α + β + Σ_j γ_j X_ji) - λ₂]} - 1
; if  λ₁= 0 ; and  y_i > - λ₂ .

(i = 1, 2, ...., n).

(9)

Of course, if we set λ₂ = 0 in these growth rate expressions, we get the growth rates associated with the regular Box-Cox model, and the semilogarithmic model, given in sections 2 and 1 above.

Note that this modified Box-Cox transformation comes with a "cost". There are now two additional parameters that have to be estimated, along with the regression coefficients.

4. Inverse Hyperbolic Sine Transformation

A more general transformation that has been suggested for dealing with data which may be negative, as well as positive, is the inverse hyperbolic sine function.

If this transformation is applied to the y_i data, we have

y_i* = sinh^-1(y_i) = log[y_i+ √(1 + y_i²)] . (10)

As before, our regression model is:

y_i* = α + β D_i + Σ_j γ_j X_ji + ε_i; E(ε_i) = 0 ; i = 1, ...., n. (11)

Notice that in this case we don't have to add additional parameters to the model's specification, which is advantageous relative to the situation with the modified Box-Cox transformation.

Substituting (11) in (10), setting the error term to zero, and solving for y_i, we get:

y_i = 0.5 {exp( α + β D_i + Σ_j γ_j X_ji) - exp[ -(α + β D_i + Σ_j γ_j X_ji)]} . (12)

Then, if we set D_i = 0 and D_i = 1 in (12), and compute the implied growth rates in (the expected value of) y_i, the results that we get are:

g_01i = {exp(α + Σ_j γ_j X_ji)[exp(β ) - 1] - exp[ -(α + Σ_j γ_j X_ji)][exp(-β ) - 1]} / B_01i

and

g_10i = {exp(α + Σ_j γ_j X_ji)[1 - exp(β )] - exp[ -(α + Σ_j γ_j X_ji)][1 - exp(-β )]} / B_10i

where:

B_01i= exp(α + Σ_j γ_j X_ji) - exp[ -(α + Σ_j γ_j X_ji)]

B_10i= exp( α + β + Σ_j γ_j X_ji) - exp[ -(α + β + Σ_j γ_j X_ji)] ; i = 1, 2, ....., n.

(13)
Yet again, the growth rates depend on the values of all of the model's parameters, as well as on the observed X values.

5. Yeo-Johnson Transformation

As a final example, let's look at the Yeo-Johnson (2000) power transformation of the dependent variable. This takes the following form:

(i) y_i* = [(y_i +1) ^λ - 1)] / λ ; if λ ≠ 0 and y_i≥ 0

(ii) y_i* = log(y_i +1) ; if λ = 0 and y_i≥ 0

(iii) y_i* = - [(1 - y_i)⁽²^{- λ)} - 1)] / (2 - λ ) ; if λ ≠ 2 and y_i< 0

(iv) y_i* = - log(1 - y_i) ; if λ = 2 and y_i< 0

(You can see that the first two cases for this transformation relate to the modified Box-Cox and Box-Cox transformations, respectively.)

Writing our regression model as in (11), but with this new definition of y_i*, we can proceed in the same manner as above:

We substitute each of the expressions for y_i*, in turn, into (11);
We set the error term to zero; solve for y_iitself;
We evaluate the resulting expressions when the dummy variable takes its two possible values;
We compute the two growth rate expressions.

The resulting growth rate expressions associated with the "switching" of the dummy variable between values of "0" and "1", or vice versa, are (for i = 1, 2, ...., n):

(i)    g_01i = {[1 + λ(α + β + Σ_j γ_j X_ji)]^1/λ - 1 } / {[1 + λ(α + Σ_j γ_j X_ji)]^1/λ - 1 } -1

  g_10i = {[1 + λ(α + Σ_j γ_j X_ji)]^1/λ - 1 } / {[1 + λ(α + β + Σ_j γ_j X_ji)]^1/λ - 1 } -1

; if  λ ≠ 0 and y_i≥ 0

(ii)   g_01i = {[exp(α + β + Σ_j γ_j X_ji) - 1] / [exp(α + Σ_j γ_j X_ji) - 1]} - 1

g_10i = {[exp(α + Σ_j γ_j X_ji) - 1] / [exp(α + β + Σ_j γ_j X_ji) - 1]} - 1

; if  λ = 0 and  y_i≥ 0

(iii) g_01i = {[1 - [(2 - λ)(α + β + Σ_j γ_j X_ji) +1]^{1/(2 - λ)}} / {[1 - (2 - λ)(α + Σ_jγ_j X_ji)]^{1/(2 - λ)}}  - 1

g_10i = {[1 - [(2 - λ)(α + Σ_j γ_j X_ji) +1]^{1/(2 - λ)}} / {[1 - (2 - λ)(α + β + Σ_jγ_j X_ji)]^{1/(2 - λ)}}  - 1

; if  λ ≠ 2 and y_i< 0
(iv)    g_01i = {1 - exp[-(α + β + Σ_j γ_j X_ji)]} / {1 - exp[-(α + Σ_j γ_j X_ji)]} - 1

g_10i = {1 - exp[-(α + Σ_j γ_j X_ji)]} / {1 - exp[-(α + β + Σ_j γ_j X_ji)]} - 1

; if  λ = 2 and y_i< 0
(14)

Of course, all of these growth rates depend, yet again, on the values of all of the parameters in the model, and vary according to the sample values taken by the regressors.

The Naive Level-Break Model

As a bit of an aside, let's consider the case where the model includes an intercept and a dummy variable, but no other regressors. In other words, the dependent variable has just a "breaking (mean) level":

  y_i* = α + β D_i + ε_i; E(ε_i) = 0 ; i = 1, ...., n. (15)

You might be thinking, "this isn't a very interesting/realistic model", and it's not! But bear with me - you'll see in a moment why I want to talk about this rather special situation..

Note the following:

If y_i* = y_i, so that the model is fully linear, β itself has the same interpretation as before, and (unsurprisingly) the growth rates no longer depend on the sample values. Specifically, we then have g₀₁ = [β / α] , and g₁₀ = - [β / (α + β)].
If y_i* = log(y_i), the expressions for the two growth rates remain the same as those given above for the case where other regressors enter the model. This is a very special result!
In all of the other transformations that we've considered, the growth rates simplify in obvious ways. They're no longer the same as the ones we derived earlier.

Here's why this matters.

For all of these other transformations (2 to 5, above), if you derive the growth rates using just the level-break model, (15), you obtain formulae that are wrong, if what you're really interested in is a model that includes other regressors.

Ouch!

As a case in point, an unpublished paper by Lachowska (2017) falls into precisely this trap in the context of the inverse hyperbolic sine transformation! A recent paper by Bellemare and Wichman (2018) that's discussed in one of Marc Bellemare's blog posts, uses Lachowska's incorrect result in one part of their own discussion. An unwary reader might easily infer too much from that discussion.

Some Estimation Issues

All of the growth rates derived above are expressed in terms of the true, unknown, values of certain parameters (and in many cases in terms of the sample values of the non-dummy regressors). Estimating the implied growth rates involves inserting estimates of these parameters into the various growth rate formulae.

This raises a number of issues. While these issues aren't my primary concern here, some comments are certainly in order.

There's an established literature concerning the properties of various estimators of the growth rates implied by the dummy variable in the semilogarithmic model. For instance, see Kennedy (1981), Giles (1982), and my recent blog post, here. Keep in kind that these results require an assumption that the model's error term is normally distributed.

The papers by Burbidge et al. (1988) and MacKinnon and Magee (1990) provide some insights into various aspects of inference in general in the context of the inverse hyperbolic sine transformation. Lachowska's (1997) results regarding dummy variable growth rates after this transformation are correct for the (uninteresting) level-break model, (13), but incorrect for the full regression model, (13).

Take-Aways

There are several take-away points from this post, including:

We must be very careful when interpreting of the impact/role of a dummy variable in a regression model where the dependent variable has been transformed in some non-linear way.
The correct interpretation depends crucially on the specific transformation that's been used.
Often, it's helpful to express this interpretation in terms of the implied growth rate (or percentage change) implied for the (mean of the) dependent variable when the dummy variable "switches" between its value of zero and one.
In most cases, these growth rates depend on all of the unknown parameters in the regression model, as well as on the sample-specific values of the regressors.
Measuring these growth rates involves estimating the model's unknown parameters and assigning values to the regressors. The latter could be achieved by using sample mean values.
Unless the parameter estimates that are "inserted" into the various growth rate formulae are chosen very judiciously, there are no guarantees with regard to the quality of the resulting estimated growth rates in small to moderate-sized samples.

I'm sure that readers would like to see some empirical applications of some of these results - these will becoming up in due course!

And don't forget - I'll be uploading a separate, brief, "Summary of Results" shortly.

References

Bellemare, M.F. and C. Wichman, 2018. Elasticities and the inverse hyperbolic sine transformation. Mimeo., Department of Applied Economics, University of Minnesota.

Box, G. E. P. and D. R. Cox, 1964. An analysis of transformations. Journal of the Royal Statistical Society, B, 26, 211–252.

Burbidge, J. B., L. Magee, and L. Robb, 1988. Alternative transformations to handle extreme values of the dependent variable. Journal of the American Statistical Association, 83, 123-127.

Giles, D. E., 1982. The interpretation of dummy variables in semilogarithmic equations: Unbiased estimation. Economics Letters, 10, 77-79.

Kennedy, P. E., 1981. Estimation with correctly interpreted dummy variables in semilogarithmic equations. American Economic Review, 71, 801.

Halvorsen, R. and R. Palmquist, 1980. The interpretation of dummy variables in semilogarithmic equations. American Economic Review, 70, 474–475.

Lachowska, M., 2017. A note on the approximate interpretation of dummy variable coefficients in inverse hyperbolic sine regressions. Mimeo., W. E. Upjohn Institute for Employment Research.

MacKinnon, J. G. and L. Magee, 1990. Transforming the dependent variable in regression models. International Economic Review, 31, 315-339.

Yeo, I-K. and Johnson, R., 2000. A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954-959.

3 comments:

Dave GilesJuly 9, 2018 at 11:58 AM
Marta Lachowska emailed me today as follows:
"I wanted to say that the paper you refer to on your blog (http://davegiles.blogspot.com/2018/07/interpreting-dummy-variable.html)—Lachowska (2017)—was a draft that I circulated for comments. When I learned that the approximation worked poorly in general—back in April—, I took it down from my website. I don't stand by the paper, but unfortunately it may still come up when people search for the title."

Thanks for clarifying, Marta. I should point out that the link to your paper still worked fine when I tested it on the day this post went out. It seems that you have taken it down since then.
ReplyDelete
Replies
Marta LachowskaJuly 10, 2018 at 2:40 PM
To clarify, I deleted the link on my research webpage back in April. At that point, the paper no longer appeared on my research webpage. I naively believed that, because the link was gone, so was the uploaded PDF. But as Dave Giles has said to me in an email, links to files and the files themselves are two different things. Clearly, the URL with the PDF remained available in my domain for search engines to find, as I discovered when I clicked on the link in Dave's blog. At that time, I contacted Weebly to have the PDF removed.
ReplyDelete
Replies
AnonymousJuly 11, 2018 at 2:08 PM
To clarify, I deleted the link on my research webpage back in April. At that point, the paper no longer appeared on my research webpage. I mistakenly believed that, because the link was gone, so was the uploaded PDF. Clearly, the URL with the PDF remained available in my domain for search engines to find, as I discovered when I clicked on the link in Dave's blog. At that time, I contacted the website provider to have the PDF removed.
Marta Lachowska
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Friday, July 6, 2018

Interpreting Dummy Variable Coefficients After Non-Linear Transformations

3 comments: