The Devil made me do it! It just had to happen sooner or later, and no one else seemed to be willing to bite the bullet. So, I figured it was up to me. I've written the definitive addition to the

**:***Dummies*series
O.K., I'm (uncharacteristically) exaggerating just a tad. I think the cover looks good, though; and I've even started to assemble some of the core material, as you'll see below. So what brought on this fit of enthusiasm for what at first blush might be misinterpreted as the neuronically challenged?

Well, last week I gave a talk in the Statistics seminar series in the Math. & Stats. department here at UVic, and this week I gave a similar talk in my own department's brown-bag seminar. The first of those talks was titled "Interpreting Indicator Covariates in Semi-logarithmic Regression Models". The talk for the Economics department was more succinctly called "Dummies for Dummies". The content of the two talks was pretty much the same, but I had to take into account a couple of differences in the language used by econometricians and statisticians. We say "regressor", and they say "covariate". They say "indicator variable", and we say "dummy variable",........ you get the picture. There's another difference too - statisticians don't need to be cajoled into attending seminars by giving the talk a provocative (and possibly insulting) title!

On this occasion the economists noticeably self-selected, and there was a healthy turnout of the curious and homeless. Regrettably we can't afford to hand out free lunches at seminars in the way our colleagues in the Business School purport to. Perhaps it's because we know that such things don't exist! People actually turned up in spite of this. Curiosity got the better of them.

These seminars were based on a recently completed research paper of mine (

**Giles**, 2011a). The main point of that paper is to derive the exact sampling distribution of a particular statistic that arises naturally when estimating a log-linear regression model with one or more dummy variables as regressors. The paper also shows what can go wrong if you don't do the job properly when interpreting that statistic - but more on this below.
Dummy variables are quite alluring when it comes to including them in regression models. However, they're rather special in certain ways. So, here are four things that your mother probably never taught you, but which will form the cornerstones of the forthcoming tome,

*Dummies for Dummies*. Meanwhile, you keen users of dummy variables may want to keep them in mind.**1. Dummies in Log-Linear Models:**

Interpreting a dummy variable's coefficient when the dependent variable has been log-transformed has to be undertaken with care. Trust me, the literature is full of empirical applications where the authors get it wrong, and most of the standard text books are no better. The way to interpret the coefficient of a

**regressor in a regression model, where the dependent variable has been log-transformed, can be seen by considering the following regression model:***continuous**ln*(

*Y*) =

*a*+

*bX*+

*cD*+

**ε . (1)

Here,

*X*is a continuous regressor, and*D*is a zero-one dummy variable. The interpretation of the coefficient,*b*, is that it is the partial derivative of*ln*(*Y*) with respect to*X*. So, a small change in*X*(up or down), will lead to a*multiplicative change*of*exp*(*b*) in*Y*, other things held equal. That is,*Y*will be scaled by*exp*(*b*).
There's another way to express this effect, though. If you recall the Taylor's series expansion for

*e*that you learned in high school, you'll know that for small values of^{x}*b*, we have the approximation, exp(b) ≈ 1 +*b*. This implies that 100*b*is the expected*percentage change*in*Y*for a*one-unit*change in*X*. (This is different from an elasticity, of course.) You might find this link helpful if you need an elementary discussion of some of this.
Unfortunately, lots of people (who really should know better) then apply the same "reasoning" to the interpretation of

*c*. The trouble is, of course that*D*is not continuous, so we can't differentiate*ln*(*Y*) with respect to*D*. The way to get the percentage effect of*D*on*Y*is pretty obvious. Curiously enough those same people who go about this the correct way when computing marginal effects in the case of Logit and Probit models just don't seem to do it right in the present context. All we have to do is take the exponential of both sides of equation (1), then evaluate*Y*when*D*= 0 and when*D*= 1. The difference between these two values, divided by the expression for*Y*based on the starting value of*D*gives you the correct interpretation immediately:
If

*D*switches from 0 to 1, the % impact of*D*on*Y*is 100[*exp*(*c*) - 1]. (2)
If

*D*switches from 1 to 0, the % impact of*D*on*Y*is 100[*exp*(-*c*) - 1]. (3)
Notice the asymmetry of the impacts - unlike the case of the continuous regressor. Also notice that, in general, these values will be quite different from the 100

*c*that some of our chums insist on using. For example, if*c =*0.6, the naïve econometrician will conclude that there is a 60% impact; whereas it is really an 82.2% positive impact as*D*changes from 0 to 1, and a 45.1% negative impact as*D*goes from 1 to 0! (Recalling the formula for the Taylor series expansion of*exp*(*c*) will make it really transparent why and when things go wrong by using*c*itself.)
Let me hasten to spoil your day by assuring you that this

**breaking news. This little pearl of wisdom has been around in the mainstream economics/econometrics literature for at least 30 years. Hence the "Read Your History" byline on the cover of***not**Dummies for Dummies.*Moreover, even more care has to be taken when using an estimated value of*c*- say*c** - after fitting model (1) using OLS. You might be tempted to simply replace*c*with*c**, in the formulae in (2) and (3).**a good plan, as we've known since at least 1981! The resulting estimator of the percentage impact is then biased, in a direction that you can figure out for yourself using Jensen's inequality. A nice practical solution - one that gives an almost-unbiased estimator of the % impact of the dummy on***Not**Y*- was suggested by Kennedy (1981), assuming normal errors in (1). You just have to modify the formula in (2) to become 100[exp(*c**-½*v**(*c**)) -1], where*v**(*c**) is the estimated variance of*c** -*i.e*., it's the square of the standard error for*c**. You make the corresponding adjustment to the formula in (3), though none of the writers back around 1980 (myself included) actually observed that there are these two separate cases. If you want to be really tricky, and use the exact minimum variance unbiased estimator, I derived the formula for this in Giles (1982). However, it's really messy, and in practice adds very little to Kennedy's estimator that I've just described. My colleague. Ken Stewart has a nice discussion of this in his excellent book, Introduction to Applied Econometrics.
So, this is something to think about the next time you're fitting a log-linear regression. If you want to go further than this, and worry about matters beyond point estimation - such as confidence intervals and the like - then you'll be thrilled to know that the sampling distribution of Kennedy's almost unbiased estimator is nowhere near normal. So be even more careful in this case, and maybe even read the

**paper**on which my seminars were based.**2. Dummies That Take Only One Non-Zero Value:**

Alright, now here's another trap for young players. I'll keep it really brief. You probably know already that if you have a dummy variable that is zero for all but one of the sample values, then your OLS estimates of the regression model's coefficients will be identical to those that you'd get if you simply dropped the "special"observation (for which the dummy is non-zero) from the regression altogether. I often set the proof of this as an exercise for my students. In addition, the residual for that one special observation will be exactly zero.

So, be careful how you interpret your OLS results if you choose to use such a dummy variable! I'm not saying that you shouldn't do so. In fact, the standard error for the estimated coefficient on the dummy variable is of some interest. It enables you to test if that observation makes a significant contribution. You could use this information to to test if an apparent "outlier" in the sample is having a statistically significant impact on your estimated model.

Did you know, however, that this same result holds for lots of other estimation methods, beyond least squares? You won't find it discussed in your textbook, but it' something that is proven, and discussed in another recent paper of mine (Giles, 2011b). More specifically, the above result relating to the use of single-valued dummy variables also holds for GMM estimation; any generalized IV estimator (including 2SLS and LIML); the MLE for any of the standard count-data models, such as Poisson, Negative Binomial and Exponential; and even for quantile regression.

**3. Inconsistency of OLS When the Number of Non-Zero Values is Fixed:**

I'll bet you didn't know that for many of the situations where you estimate a regression model with a dummy variable in it, the estimator of that variable's coefficient is inconsistent. This has nothing to do with random regressors, measurement error or omitted variables. The model can meet all of the usual "textbook assumptions". Guess what else? The problem I'm alluding to arises not just with OLS estimation, but also with any generalized instrumental variables (IV) estimator. And that's not all! The estimator of that coefficient has a non-normal sampling distribution - even for an infinite sample size! The asymptotic distribution is horribly skewed to the right, so this is really going to cause strife if you try to construct confidence intervals or test hypotheses about the dummy's coefficient, but ignore this fact. Remember - this is an asymptotic result, so it doesn't get any better even if you have a huge sample of data.

What on earth is this all about, and why didn't your mom warn you?

Well, notice that I said "....for many of the situations...". So this problem doesn't always arise. Also notice that I was referring only to the coefficient(s) of the dummy variable regressor(s) - not to estimators of the coefficients of the "regular" (measured) regressors in the model. Everything is just fine in their case. So what are these "...many situations.."? You probably won't like the answer to this, because unfortunately these are situations you'll have met many, many times - they're really common, and rather interesting. In a nutshell any time that the dummy variable takes a non-zero (usually unit) value for a finite and

**, then the usual asymptotics don't apply and you get the problems I've just mentioned. Of course, the situation of OLS estimation when there is just a single non-zero value for the dummy variable in the sample is a special example of this, and this case is discussed by***fixed number of observations***Hendry and Santos**(2005). It doesn't seem to be widely known, however. I provide the generalization from one observation to any finite number of observations; and from OLS to IV estimation in my recent paper,**Giles**(2011c).
So, consider the following situation, for example:

We want to fit a regression using a sample of data that covers the period 1940 to 1980, and we notice that there is an obvious structural break corresponding to the period of the 2nd World War - 1939 to 1945. So, when we estimate our regression model we include a dummy variable (either to shift the intercept, or multiplicatively to shift one or more of the slope parameters), and this dummy variable is zero except for the 7 years, 1939 to 1945 inclusive. Now, we can't re-write the history books, more's the pity. So, no matter how much more data were to become available, before 1940 or since 1980, our dummy variable will always have just 7 non-zero values. When we look at the coefficient of that dummy variable, the OLS estimator will still be "Best Linear Unbiased" (under our otherwise standard assumptions), but it will be inconsistent. It will be very unreliable even with an infinitely large sample size. We should also be really careful about constructing confidence intervals or tests relating to this coefficient, because the non-normality of the sampling distribution for this particular OLS estimator, even asymptotically.

How many times have you seen emprical studies, perhaps using thousands of observations, where dummy variables of the type I've mentioned appear as regressors? Lots, I'll bet. Those large samples are not much help at all in this case, and you should be skeptical when the authors get all excited about the interpretation of the coefficients of their dummy variables. These numbers mean very little at all!

**4. The Perils of Using Seasonal Dummy Variables:**

Finally, ask yourself: "How many times have I estimated an OLS regression model using quarterly time-series data, and included seasonal dummy variables to deal with the observed seasonality in the dependent variable?" (Probably more times than you can recall.) Now ask yourself: "What on earth had I been inhaling?" (Don't answer that if you don't want to. Just end me an email and I promise - nudge, nudge -it won't go viral.)

Now, don't panic - I'm not about to launch into a boring little homily about the "dummy variable trap". Here's the thing. Do you recall the

**Frisch-Waugh**Theorem? It was actually published in volume 1 of*, would you believe! In the context of our seasonal dummy variables this theorem tells us the following, as was pointed out by***Econometrica****Lovell**(1963). Suppose that we estimate the following regression model by OLS, where the*S*'s are the quarterly seasonal dummy variables:_{i}*Y*=

*a*+

*bX +c*

_{1}

*S*

_{1}+

*c*

_{2}

*S*

_{2}+

*c*

_{3}

*S*

_{3}+

*e*. (4)

Let the

*b**be the OLS estimator of*b.*
Now, suppose that we decide to "seasonally adjust" the

*Y*data by "explaining" the seasonal component in that variable using the seasonal dummies, and then eliminating that part of the series. So, we fit an OLS regression:*Y*=

*a*+

*c*

_{1}

*S*

_{1}+

*c*

_{2}

*S*

_{2}+

*c*

_{3}

*S*

_{3}+

*v*, (5)

and then treat the residuals as the seasonally adjusted

*Y*series,*Y*. We do the same sort of thing to "seasonally adjust" the^{sa}*X*series. We fit the OLS regression:*X*=

*a'*+

*c'*

_{1}

*S*

_{1}+

*c*'

_{2}

*S*

_{2}+

*c*'

_{3}

*S*

_{3}+

*u*, (6)

and treat these residuals as the seasonally adjusted series,

*X*Finally, we regress^{sa}.*Y*on^{sa}*X*:^{sa}*Y*(7)

^{sa}= a" +b"X^{sa}+ e' .
The

**Frisch-Waugh-Lovell**Theorem tells us that the OLS estimator of*b"*in (7) will be identical to the OLS estimator of*b*, namely*b**, in (4).
This is a purely algebraic result - it doesn't rely on any "statistics"

*per se*,*and it certainly doesn't rely on any assumptions about the random errors in any of the fitted models. In addition, it doesn't even require that OLS estimation be used throughout. I showed some years ago (Giles, 1984) that the same results emerge if you replace the OLS estimator with any IV estimator.*
What you need to be aware of is that this is not just a rather quaint little result. The implications of what we've just seen are actually quite important. Let's see why this is. First, if we fit a regression with regular data and seasonal dummy variables, this is equivalent to "seasonally adjusting"

**of the data (***all**Y*and*X*). Second, the variables have all been effectively "seasonally adjusted" in exactly the same way, which is totally unrealistic - this is not what happens when our statistical agencies seasonally adjust time-series using the**Census X-12-ARIMA**method (which you can download for free, and is a standard feature in**EViews**, if you use that package). Third, the data have not really been seasonally adjusted at all, because no account has been taken of the other components of the time-series,*Y*and*X*. In general, they will have trend and cyclical components that need to be taken into account, properly, and differently for each series, as is done when the X-12-ARIMA method is used.
So the bottom line is that including seasonal dummy variables makes sense only if: (a) you think that the dependent variables and

**of the regressors in your model have a simple additive seasonal component; and (b) you***all**think they***don't***have any trend or cyclical components! When could you last put your hand on your heart and swear that this was the case in practice?*
Anyway, I hope that this sneak preview will whet your appetite somewhat, and I look forward to receiving the flood of orders for

*Dummies for Dummies*when it rolls of the presses. You'll be the first to know - trust me!**The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.**

__Note__:

__References__**Frisch**, R. and F. V. Waugh (1933). Partial time regression as compared with individual trends.

*Econometrica*, 1, 387-401.

**Giles**, D. E. (1984). Instrumental variables regressions involving seasonal data.

*Economics Letters*, 14, 339-343.

**Giles**, D. E. (2011a). Interpreting dummy variables in semi-logarithmic regression models: exact distributional results. Econometrics Working Paper EWP1101, Department of Economics,

Giles, D. E. (2011b). Econometric models with single-valued dummy variables. Mimeo., Department of Economics,

**Giles**,

**D. E. (2011c). On the inconsistency of instrumental variables estimators for the coefficients of certain dummy variables. Econometrics Working Paper EWP1106, Department of Economics,**

**Hendry**, D. F. and C. Santos (2005), Regression models with data-based indicator variables.

*Oxford*

*Bulletin of Economics and Statistics*, 67, 571-595.

**Kennedy**, P. E. (1981). Estimation with correctly interpreted dummy variables in semilogarithmic equations.

*American Economic Review*, 71, 801.

**Lovell**, M. C. (1963). Seasonal adjustment of economic time series.

*Journal of the American Statistical Association*, 58, 993-1010.

**Lovell**, M. C. (2008). A simple proof of the FWL (Frisch, Waugh, Lovell) theorem.

*Journal of Economic*

*Education*, 39, 88-91.

(This one is definitely) © 2011, David Giles

I am a little unclear where the factor of 100 comes from. I tried deriving it myself, and I also simulated an empirical example, and I just can't seem to make sense of this.

ReplyDeleteDimitriy: It's becuase I'm reporting percentage impacts.

DeleteDoh! I think I left my brain at home today. Thanks for your help, and for the best econometrics blog out there.

ReplyDeleteDemitriy: No problem! Glad you like the blog.

ReplyDeleteIs Giles (2011b) available for public consumption anywhere?

ReplyDeleteDimitriy - still trying to finalize it!!!!!!!!!!!

DeleteThis blog is an awesome help but probably due to the shallowness of my statistical knowledge I can't find the answer to an urgent problem of mine:

ReplyDeleteI've got an regression analysis in which one independent variable virtually works as an dummy variable. The variablie is called "GDPleader" but since the USA is the permanent leader in this regression it's just a dummy for USA.

In the linear model the variable is positive and significant but after the logit transformation it turns out to be negative and not significant anymore.

The authors of the paper (Chinn and Frankel 2005) don't comment this process at all but I really want to know what's happening.

Thank you!

Luca - it's hard to tell without seeing the data. However, it's an interesting result because all too often we see people saying "I just used OLS because the results are basically the same as after a logistic transformation". In the case you cite, they are fundamentally different.

DeleteMr. giles, is it fine if we do a probit model with all regressor are dummy variables?

ReplyDeletethanks..

Yes, that's just fine.

DeleteThis is extremely helpful for a neophyte.

ReplyDeleteAre there similar traps we should be looking out for with interpreting coefficients on dummy variables in quantile regression models, with or without log dependent variables?

For example, suppose I'm doing median regression looking at a binary treatment T and have one binary covariate (e.g. male / female). If I run med(y|T,F) = b1*T + b2*F + b3*(TxF), can we do the usual additive thing and say that the effect on the median for women is b1+b3?

And what if the model is med(log(y)|T,F)?

Bert - an interesting question. Yes, (b1 + b3) should be interpreted as you suggest in this case. This is still differences-in-differences, but the response is the median, not the mean.

DeleteIn the second case, taking logs for the dependent variable could be motivated by a desire to have the usual regression coefficients measure RELATIVE changes, rather than level changes. Here, those relative changes would be with respect to the median, not the mean, and (b1+b3) would be interpreted as such.

Thank you for an excellent post. Only recently I became aware of the need to transform the dummy coeficients.

ReplyDeleteMaybe I missed some reference, but it seems that textbooks don't mention it: for example, Wooldridge's graduate text book does not mention it and his undergraduate textbook mentions only the "naive" (100[exp(c) - 1]) transformation. He also seems to argue that because the impacts of going from D=0 to D=1 are different from going from D=1 to D=0, as you mentioned in the text, only the OLS beta should be reported or commented. I found nothing on the topic in "mostly harmless econometrics", or Cameron e Trivedi's, 2005, "Microeconometrics - Methods and Applications". It definitively was never mentioned in my graduate econometrics classes.

I'm curious tp get an estimate on how often applied empirical papers interpret the coefficients of the dummy estimates making the transformation. From memory I could not think of many. For instance:

- in Krueger's QJE-1993 paper on computers and the wage structure does the "naive" transformation (http://flash.lakeheadu.ca/~mshannon/Krueger_QJE93.pdf);

- in Fryer 2011, a recent handbook of labor economics chapter on on racial inequality in the US no transformation of the data if made whan interpreting results.

Do you have statistics on this?

In any case, it seems that there is still little awareness of this.

There is this tread at the EJMR: http://www.econjobrumors.com/topic/reporting-dummy-coeficients-in-log-linear-models .

I would like to know your opinion in two topics:

1) do you agree with the suggestions in the tread that the regression table should have the original OLS estimates and that the transformed coeficients should only be mentioned in the text?

2) The last post in the EJMR tread points to a literature on how OLS log-linear models can be biassed if errors are heterocedastic or heavy tailled, which is quite common in the data. And that in this case the Kennedy transformation does not hold (as you say it requires normality). What to do in those cases?

Thanks for the comment and questions. I agree that this point is generally overlooked - one reason why I talked about it in this post! I certainly don't have any stats. along the lines you asked about.The thread you pointed me to was interesting.

DeleteTo answer your other questions:

1. I do agree with this suggestion.

2. I was not previously aware of the 3 articles referred to. But then, I never read "Journal of Health Economics". :-) I will certainly read the papers though. Perhaps after doing so I'll have some suggestions as to what to do in the case(s) you mention. Sorry I don't have a quick answer.

Thanks for pointing me to this!

Hi Prof Giles,

ReplyDeleteI have a related question: what if the regressors are also logged? I've seen application where people are estimating something along the lines of

LnY = a + BlnX + cD + e, where D is a binary dummy,

but to my mind, that doesn't seems to make sense, since if we take the exponential of both sides, D is no longer a dummy, since e^0= 1 and e^1 = e. So is there really any meaningful interpretation?

That's fine. They have in mind a model in which the error enters multiplicatively:

DeleteY = A (X^B)(DUM^c)exp(e) ; where DUM = 1 or e

Then LnY = a + BLnX + cLn(DUM) + e, where a = Ln(A).

Notice that D = Ln(DUM) = 0 or 1, as required.

Oh my, that actually makes a lot of sense now you've put it like that! Glad to know my MSc dissertation wasn't fatally flawed after all.

DeleteMany thanks, I'll definitely visit this blog again.

I am so thankful to find your great blog here, Prof Gile! I am doing regression using panel data of 172 regency from 2001-2012. Based on some arguments, i did the unit root test (cause my time series dimension is long enough). I found one of my dummy variable wasn't stationer, can i include it in my model? or i should drop it?

ReplyDeleteWould you please recommend me some references--journal or article? Thank you.

Esti - you don't need to test dummy variables for stationarity - all dummy variables are stationary by construction.

DeleteGreat post from great person, thank you Dave.

ReplyDeleteHi Dave -- great post, thanks. I was wondering if I need to be careful in interpreting proportions variables, as well? I have three categories and the proportions of each sum to 1, so I include only two of the proportions. I also include squared terms for those two proportions. The model is log-linear. Any pitfalls I should be aware of?

ReplyDeleteThanks, Nancy

Nancy - I think you've got it covered.

DeleteDG

Hi Prof Giles, thanks for this post!

ReplyDeleteJust a quick question, when using the formulas for interpreting the dummy variables, does the sign for c matter? i.e. if c is negative, would you include the sign in the formula or not?

Yes, you certainly would.

DeleteExcellent post, thank you for all these insights, I learned a lot ! Could you please help with the following? I am reading a paper in which the original dependent variable Y takes many values of zero and "some" positive values. The authors log-transformed all positive values and used ln(y) for all y>0 as the new dependent variable, but included in the model as a regressor a dummy taking the value of 1 whenever the original variable was zero . Thus, the model is ln(y)= a+ b* D + C*X + e where D=1 if Y=0 and D=0 otherwise. Does this make sense to you? If yes, what is the interpretation of the dummy variable here?

ReplyDeleteNo, this makes no real sense to me.

DeleteThank you for just deleting the comment. I was trying to help.

ReplyDeleteBeing facetious is hardly going to help. Your comment and my response were deleted while I have a proper chance to check what you had to say. Then I'll respond. Probably not today, though. Given that you've chosen to remain anonymous, this is the only way I can communicate with you. If you check other posts you'll see that I always appreciate and acknowledge corrections that have had to be made as a result of a comment from a reader.

DeleteWhat I had written originally was in fact perfectly correct. However, if you look at the 2 paragraphs that now appear below equation (i) in the post, you'll find a more detailed explanation. Thank you for querying this - it led me to improve the post.

DeleteHi Dave,

ReplyDeleteGreat blog post! On a related note, would this interpretation also hold if the independent variable is logged and the dependent variable is a dummy variable?

Ie: D = a + b(log Z) + cX + e

Is it simply that, say, a 10% change in Z is associated with a b*10 percentage change in the probability of D?

Thanks!