Friday, September 2, 2016

Dummies with Standardized Data

Recently, I received the following interesting email request:
"I would like to have your assistance regarding a few questions related to regression with standardized variables and a set of dummy variables. First of all, if the variables are standardized (xi-x_bar)/sigma, can I still run the regression with a constant? And, if my dummy variables have 4 categories, do I include all of them without the constant? Or just three and keep the constant in the regression? And, how do we interpret the coefficients of the dummy variables in such as case? I mean, idoes the conventional interpretation in a single OLS regression still apply?"

Here's my (brief) email response:
"If all of the variables (including the dependent variable) have been standardized then in general there is no need to include an intercept - in fact the OLS estimate of its coefficient will be zero (as it should be).
However, if you have (say) 4 categories in the data that you want to allow for with dummy variables, then the usual results apply:
1. You can include all 4 dummies (but no intercept). The estimated coefficients on the dummies will sum to zero with standardized data. Each separate coefficient gives you the deviation from zero for the intercept in each category.
OR (equivalently)
2. You can include an intercept and any 3 of the dummies. Again, the estimated coefficients of the dummies and the intercept will sum to zero. Suppose that you include the intercept and the dummies D2, D3, and D4. The estimated coefficient of the intercept gives you the intercept effect for category 1. The estimated coefficient for D2 gives you the deviation of the intercept for category 2, from that for category 1, etc."
You can easily verify this by fitting a few OLS regressions, and there's a lot more about regression analysis with standardized data in this earlier post of mine.

© 2016, David E. Giles