## Saturday, August 10, 2013

### Large and Small Regression Coefficients

Here's a trap that newbies to regression analysis have been known to fall into. It's to do with comparing the numerical values of the point estimates of  regression coefficients, and drawing conclusions that may not actually be justified.

What I have in mind is the following sort of situation. Suppose that Betsy (name changed to protect the innocent) has estimated a regression model that looks like this:

Y = 0.4 + 3.0X- 0.7X2 + 6.0X3 +.....+ residual .

Betsy is really proud of her first OLS regression, and tells her friends that "X3 is two times more important  in explaining y than is X1" (or words to that effect).

Putting to one side such issues as statistical significance (I haven't reported any standard errors for the estimated coefficients), Is Betsy entitled to make such a statement - based on the earth-shattering observation that "six equals three times two"?

It's no cop-out, but the answer is "it depends"! Specifically, it depends on the units of measurement that we're dealing with here. Yes - units!

You need to remember how the coefficients are interpreted in a linear regression model. They're the partial derivatives of Y with respect to each of the X variables, in turn. It's a ceteris paribus situation, of the type beloved of economists. It's as if the other X variables are "held constant".

In other words, in a linear regression model, the coefficients are the "marginal effects". They measure how much Y changes for one-unit changes in the X variables. The  word, unit, is key here! Let's see why.

The first thing that we need to note is that we're dealing with a regression equation here. This means that unless all of the variables happen to be unit-less, the units of the term (Y) on the left side of the equation must equal those of the total right side of the equation. You can't equate apples with oranges.

Looking more closely at the right side of the estimated regression equation, we see that it's made up of a sum of terms. You can't add (or subtract) apples and oranges, either! So now we know two things about the units in question here:
1. The units of Y must be the same as the units of the terms "4.0", and " 0.7X"; etc.
2. Each term on the right side of the equation must have the same units as every other term. That is, the units of the term "0.4" just be the same as the units of the terms "3.0X1" and 6.0X3", etc.
Let's make things a little more specific.

Suppose that Y is measured in $; X1 is measured in Kg; and X2 is measured in inches; and X3 is measured in seconds. In that case the units of measurement for the estimated regression coefficients are$ (for the intercept); and then $per Kg,$ per inch, and $per second, for the next three coefficients. This means that every term (coefficient times variable) on the right side of the equation has the units of just$, as required.

Now we can see that there's something fundamentally wrong when Betsy says that the effect of X3 on Y is twice as big as the effect of X1 on Y. The two coefficients that are being "compared" in arriving at this conclusion have different units - ($/Kg.) and ($/second), respectively. That just doesn't make any sense!

Of course, this particular problem disappears if X3 is also measured in Kg., or if X1 is measured in seconds. It would also disappear if (for example) all of the variables in the model were unit-less.

Notice, though, that I said "this particular problem". Unfortunately, even if we don't have a units-measurement problem, there's typically going to be another reason why Betsy's numerical comparison, and her conclusion about the relative importance of different regressors, is going to be flawed.

A Digression:

To help us at this point, let's think back to two really simple things that we learn about in elementary descriptive statistics:
1. Converting the sample covariance for two random variables into a correlation coefficient.
2. Comparing the variability in one sample with that in another sample.
You'll remember that the first of these involves taking the covariance and dividing it by both of the sample standard deviations (for the the two variables). The covariance has units - they're the product of the units for each of the two variables. So, if one variable is measured in $, and the other in Kg., then their covariance has units which are ($ times Kg.).

This isn't very easy to interpret!. Also, a covariance can take any value, positive or negative. So, is a covariance of (say) 7.634 \$ times Kg. a large covariance or not

Converting the covariance to a correlation achieves two things. First, because a standard deviation has the same units as the variable itself, dividing the covariance by the standard deviations creates a measure that is unitless. Second, the resulting correlation has a finite scale. It's bounded in value between -1 and +1. Now it's much easier to interpret the value that we've computed; and we can now make meaningful comparisons between correlations for different problems.

We've essentially standardized the original covariance measure to make it more useful.

In Case 2 above, because a sample standard deviation has units that are the same as those of the original variable, we've got a similar problem if we want to compare the standard deviations (or variances) for two different samples. Unless the data in each sample happens to have the same units, it's like comparing apples with oranges, again.

There's a further problem, too. Each sample is likely to have values that differ in terms of their typical magnitude. In this case, we learn that if we divide each sample standard deviation by the associated sample mean, the resulting ratios will be unitless, and re-scaled. The ratio of a standard deviation to the mean is called the sample "coefficient of variation".

Once again, converting standard deviations into coefficients of variation gives us something that we can use to make comparative statements.

Now, if we think back to Betsy's situation, you can see that it has a lot in common with the two things we've just been thinking about. Maybe there's some insight here to help Betsy in her quest to compare the values of here estimated regression coefficients?

Well, yes, there is. It leads us to think of computing what we used to call "Beta Coefficients"  in econometrics (e.g., Goldberger, 1964). This term is still used in some disciplines, especially in some of the social sciences. I'm not talking about the original (beta) regression coefficients themselves, and neither am I referring to the "betas" that measure the risk of holding specific financial assets!

I'm talking about "standardized" versions of the regression coefficients.

By now, you can probably guess how these measures are going to be defined and constructed. We take the sample data for the Y variable, and we standardize each of the n values. That is, we construct Yi* = (Yi - Ybar) / (s.d.(Y)), where Ybar = (1 /n)Σ(Yi). Then we standardize each of the X variables (the regressors in our model) in a corresponding way.

Notice one detail. If there is an intercept variable (a column of ones) among the regressors, as in Betsy's model, this standardizing process will eliminate this intercept. Why? Well, simply because the sample average of the n "one" values for this also just "one", so when we subtract this average, the result is a zero value (at every point in the sample).

Then, using our standardized Y and X data, we fit our model using OLS. The resulting estimates are called the "standardized" regression coefficients, or the "Beta coefficients". Notice that while the terminology, "standardized coefficients", is quite common it's actually a bit of a mis-nomer. It's the data that have been standardized - not the coefficients.

What have we gained by all of this? Well, as Yi*, and all of the standardized regressors, are unitless, so are the associated regression coefficients. It makes sense now to compare the values of the "Beta coefficients", and draw conclusions of the type that Betsy tried to come to by (wrongly) using the "regular" regression coefficients.

Here's a simple example to illustrate this. It's an OLS regression for an imports-demand function. The regressor P is  the exchange rate, so obviously its units of measurement are different from those of imports (IMP) or GDP! The estimated coefficients have the anticipated signs, and are highly significant.

WWBS (What Would Betsy Say?)

She'd wrongly conclude that (in absolute value) the impact of P on IMP is roughly 70,000 times greater than the impact of P on IMP.

To get the standardized regression coefficients we can select:

View ..... Coefficient Diagnostics ....... Scaled Coefficients

Here's what we get:

The interpretation of the standardized coefficients is as follows:
• A one sample standard deviation increase in GDP leads to an increase of 0.62 standard deviations in IMP (ceteris paribus).
• A one sample standard deviation in P leads to a decrease of 0.36 standard deviations in IMP.
In this particular sense, and contrary to what Betsy thought, the impact of P on IMP is roughly half the impact of GDP on IMP (in absolute value). Half as big, compared with 70,000 times as big!  I'm sure you get the picture!

Now let's check to see where the above results for the standardized coefficients came from.

We can create the standardized data as follows:

series     IMP_STAR = (IMP - @MEAN(IMP)) / @STDEVS(IMP)
series    GDP_STAR = (GDP - @MEAN(GDP)) / @STDEVS(GDP)
series     P_STAR = (P - @MEAN(P)) / @STDEVS(P)

(Note that the S that forms part of the standard deviation function is to ensure that we get the sample standard deviations, rather than the population standard deviations. Does this actually affect the results for the standardized coefficients?)

Then, we fit the model with the standardized data (and no intercept):

Yes, we've confirmed the results for the standardized OLS regression coefficients. You can find the EViews workfile that I've used on the Code page for this blog, and the data are on the Data page.

(The standardized regression coefficients also get produced as a matter of course in a number of other packages, such as SPSS and SHAZAM, and with Stata, you can obtain them by using the "beta" option when fitting a regression model. In SAS you use the "stb" option on the "model" statement that goes with "proc reg" command for multiple regression. In R, you load the "QuantPsyc" package, and then use the command "lm.beta", in place of the usual "lm" command.)

A final, and very important comment..........

Our focus here has been on numerical differences between estimates of the regression coefficients. This is quite different from considering if the corresponding population parameters are different, and looking at the statistical significance associated with a test of that difference.

Reference

Goldberger, A. S.
, 1964. Econometric Theory. Wiley, New York.

© 2013, David E. Giles

#### 12 comments:

1. If the regression equation is in double ln form, are the coefficients comparable also since they can be interpreted as elasticities?

Thanks.

1. Yes - of course - elasticities are unitless.

2. Professor, assuming both variables are significant and coefficients of both the variables are roughly same, if t-statistic of X1 is 5 times than t-statistic of X3,Can one make a statement that X1 is a very strong determinant than X3?

1. You could say that X1 is statistically more significant than X3.

3. A question related to not this post. Suppose i'm testing the null that size of trade does not have an impact on GDP. My data points range from, say, 1940-2010. From the test results I could not reject the null. Now i added two data points, that is my data is from 1940-2012. And i re-estimated that equation. Now i can reject the null and conclude that trade does have an impact on GDP.
How logical is this result?

Kindly reply.

Thanking you in advance.

1. This sort of thing happens all of the time. The model you have fitted is not "stable" over the change in sample period. It's always a good idea to check if you more produces consistent results over different (relevant) sample periods - this is just another robustness check that we should perform. In your case, think how important your result is if you were to have used your model for forecasting over the later period!
I wouldn't trust your results - at least not on the basis of the information you've supplied.

2. Thank you Professor.

I think it may be the case that these two new observation may be outliers.

The other thing is that suppose i could not marginally reject the null at conventional levels. (That is level of significance = .05 and P-value is .06). But when i re-estimated the model adding two data points, i rejected the null very strongly. (Now, level of significance = .05 and P-value is .00). I think this is due to outlier. Is this result acceptable?

Another case: in the first estimation, level of significance = .05 and P-value is .10). So i could not reject the null. In the second estimation, level of significance = .05 and P-value is .00)? Now i can reject the null confidently. Again, I think this is due to Outlier. I think this result is not consistent.

4. Good to know the concept of beta coef., though I thought units matter in the interpretation, too. A related question is in other papers, I saw people interpret the OLS coef. as varying X1 (and/or X2) by one s.d., Y will change that much. What is the logic of doing that?

5. Professor, I have a regression equation involving among several predictor variables, dummy variables created for levels of a categorical variable. For eg _
Call Setup Time = C + b1*Loc + b2*TimeOfDay1 + b3*TimeOfDay2 + b4*TimeOfDay3 ...
If the values of b3 and b4 are very similar, can we do a t-test to say that they are probably then means of similar samples and hence probably club the levels TimeOfDay2 and TimeOfDay3?

1. Yes, if you wish.

6. As the sum of the standardized coefficients is 1, can we interpret on the scale of 1, the size of the coefficient is weight (contribution) of that independent variable?

1. If you meant to say "to the dependent variable", then the answer is "yes".