## Friday, May 3, 2013

### When Can Regression Coefficients Change Sign?

Let's suppose that you've been running regressions happily all morning. It's sunny day, but what could be better than enjoying some honest-to-goodness econometrics? Suddenly, you notice that one of the estimated coefficients in your model has a sign that's the opposite to what you were expecting (from your vast knowledge of the underlying economics). Shock! Horror!

Well. it's really good that you're on the look-out for that sort of thing. Congratulations! However, something has to be done about this problem.

Being young, with good eyesight, you also happen to spot something else that's interesting. One of the other estimated coefficients has a very low t-statistic. You have a brilliant idea! If you delete the variable associated with the very small t-value, maybe the "wrong" sign on the first coefficient will be reversed. Is this possible?

Sadly, no, it's not going to be that simple.

Let's make sure that what I'm talking about is quite clear. Here are the results for a hypothetical regression, estimated by OLS, and with t-statistics in parentheses:

y = 0.43 + 1.45X1 - 0.89X2 + residual    ;   n = 34 ;   R2 = 0.89
(1.45)   (0.66)    (-1.83)

You were expecting a positive  coefficient for X2. If you drop X1 from the regression, and re-estimate the model by OLS, could the sign of the coefficient of X2 become positive? No - that's impossible.

Leamer (1975) proved that such a change in sign  cannot occur if the absolute value of the t-statistic for the variable you're deleting is less than the absolute value of the t-statistic for the variable whose sign you're interested in. That's the situation we have in the example above.

On the other hand, if we were to delete X2 from the above model, its possible for the sign of the coefficient for X1 to change.

A number of extensions/generalizations of this result are also available, including:

• Leamer's necessary condition, stated above, was extended to include a sufficient condition by Visco (1978).
• Leamer's result was re-stated in a somewhat simpler form by Oksanen (1987).
• Leamer's result was generalized by McAleer et al. (1986) to apply to cases where the deleted variables are combined in arbitrary linear combinations. See, also, Visco (1988).
• All of the above results were shown by Giles (1989) to hold if the model is estimated by any Instrumental Variables estimator, rather than by OLS.
This last extension is based on the algebraic relationship between the OLS and (generalized) I.V. estimators, and the results on restricted I.V. estimation given by Giles (1982).

By the way - HT to a former student of mine, Darren Gibbs, who asked some interesting questions that led me to write the 1989 paper referenced below. I never teach a course without learning something new!

References

Giles, D. E. A., 1982. Instrumental variables estimation with linear restrictions. Sankhya: The Indian Journal of Statistics, B, 44, 343-350.

Giles, D.E.A., 1989. Coefficient sign changes when restricting regression models under instrumental  variables Estimation. Oxford Bulletin of Economics and Statistics, 51, 465-467.

Leamer, E. E., 1975. A result on the sign of restricted least-squares estimates. Journal of Econometrics, 3, 387-390.

McAleer, M., A. Pagan, & I. Visco, 1986. A further result on the sign of restricted least-squares estimates. Journal of Econometrics, 32, 287-290.

Oksanen, E. H., 1987. On sign changes upon deletion of a variable in linear regression analysis. Oxford Bulletin of Economics and Statistics, 49, 227-229.

Visco, I., 1978. On obtaining the right sign of a coefficient estimate by omitting a variable from the regression. Journal of Econometrics, 7, 115-117.

Visco, I., 1988. Again on sign changes upon deletion of a variable from a linear regression. Oxford Bulletin of Economics and Statistics, 50, 225-227.

1. Interesting! It makes sense, but I didn't know it was so simple.

Of removing the other variable CAN make the test go from "significant" to "insignificant," which unfortunately is all anybody seems to care about.

2. so what we can do if we had the wrong sign, while the variable is important?

1. Re-specify the model. e.g., is the functional form appropriate? Does the model pass the usual specification tests? As a last resort you can constrain the coefficient to be of the desired sign.

3. Is it possible for the sign of the regression coefficient to change from "+" in OLS to "-" in 2SLS? Or does it imply that there is an issue with the specification?

1. Yes, this can certainly happen.

4. Very useful!