Econometrics Beat: Dave Giles' Blog: Regression Coefficients & Units of Measurement

Tuesday, April 14, 2015

Regression Coefficients & Units of Measurement

A linear regression equation is just that - an equation. This means that when any of the variables - dependent or explanatory - have units of measurement, we also have to keep track of the units of measurement for the estimated regression coefficients.

All too often this seems to be something that students of econometrics tend to overlook.

Consider the following regression model:

y_i = β₀ + β₁X_1i + β₂x_2i + β₃x_3i + ε_i ; i = 1, 2, ...., n (1)

where y and x₂ are measured in dollars; x₁ is measured in Kg; and x₃ is a unitless index.

Because the term on the left side of (1) has units of dollars, every term on the right side of that equation must also be expressed in terms of dollars. These terms are β₀, (β₁x_1i), (β₂x_2i), (β₃x_3i), and ε_i.

In turn, this implies that β₀ and β₃ have units which are dollars; the units of β₁ are ($ / Kg); and β₂ is unitless. In addition, the error term, ε, has units that are dollars, and so does its standard deviation, σ.

What are some of the implications of this?

The "standard errors" associated with each OLS estimate of the β's also have units. They're the same as for the β's themselves. (The t-ratios, of course, are always unitless.) Also, strictly speaking, when we report confidence intervals for any of the β's these units of measurement should also be reported.

Another important implication is that we should be very careful indeed when comparing the numerical values of estimated regression coefficients, even within the same model. Suppose that the OLS point estimates of β₁ and β₃ in (1) are 1.0 and 3.0 respectively. Does this mean that changes in x₃ have three times the impact on y, as compared with changes in x₁? Certainly not!

Remember, that value of 1.0 is actually 1.0 $ per Kg, whereas the 3.0 value is actually $3.0. You can't compare magnitudes that are in different units, unless this difference is properly taken into account.

There's also a slightly more subtle point that goes beyond this simple arithmetic.

To take the discussion a step further, suppose that we added another regressor to equation (1): x₄, with units of dollars, and a coefficient of β₄. Suppose, too, that the OLS estimates of β₂ and β₄ are 2.0 and 4.0 respectively. Both of these numbers are unitless, so we can legitimately say that the point estimate of β₄ is twice as big as the point estimate of β₂.

However, can we say that "the impact of x₄ on y is twice as big as the impact of x₂ on y"? I know that it's tempting to do so!

To answer this question, first we have to decide what it actually means!

Specifically, we have to decide what sort of "change" in the variables we're talking about when we use the term "impact".

It's true that a one-unit (dollar) change x₄ leads to a change in the dollar value of y that is twice the size of the dollar change in y that occurs when x₂ changes by one unit. However, consider the following point.

Suppose that the sample size is n = 6, and that the sample values for x₂ and x₄ are x₂: ${1, 2, 3, 4, 5, 6.45}; and x₄: ${0.01, 0.02, 0.03, 0.04, 0.05, 0.0645}.The sample averages and standard deviations are 3.575 and 1.9959 for x₂, and 0.03575 and 0.019959 for x₄. So, a one-unit change in x₂ is a relatively modest change, in the sense that it's a change that's equivalent to approximately half a standard deviation. On the other hand, a one-unit change in x₄ is quite substantial, in the sense that it's a change of roughly 50 standard deviations!

If the y variable has a sample standard deviation of 1.0, then the interpretation of the OLS estimates (2.0 and 4.0) of β₂ and β₄ is as follows. Ceteris paribus, a change of half a standard deviation in x₂ will lead to a 2 standard deviation change in y; while a change of 50 standard deviations in x₄ will lead to a 4 standard deviation change in y.

Now which coefficient estimate do you think is the "larger" - that of β₂ or that of β₄?

This suggests an alternative way of thinking about the "impacts" of x₂ and x₄ on y. We might measure these impacts in terms of changes in the variables after they have been scaled to take into account the different sample variations in the data.

Let's illustrate this by estimating some OLS regression models using EViews 9. The workfile and the (totally artificial) data are available on the code and data pages for this post, respectively.

Here are my OLS results, using the "raw" data:

Then, if I select the "View" tab, and choose the options, "Coefficient Diagnostics" and "Scaled Coefficients", from the drop-down menus I get:

According to the EViews manual, the "Standardized Coefficients" in the table above are ".... the point estimates of the coefficients standardized by multiplying by the standard deviation of the dependent variable divided by the standard deviation of the regressor."

That's absolutely correct, but an alternative way of describing them is that they are the OLS point estimates that we get if divide y by s_y, x₁ by s_x1, and x₂ by s_x2, where s_y is the sample standard deviation of y, and s_xj is the sample standard deviation of x_j; j = 1, 2. Then, when we estimate the modified model by OLS, and the estimated regression coefficients are the "Standardized Coefficients". Let's verify this using, using EViews:

The estimated coefficients for the (non-constant) regressors are identical to the corresponding "standardized coefficients" that we saw above.

Equvialently, we can estimate the model by OLS after scaling the regressors x₁ and x₂ by multiplying them by (s_y / s_x1) and (s_y / s_x2) respectively. In addition, we'll multiply the intercept "variable" (the series of "ones") by s_y. Then, we'll estimate the regression model, with y itself as the dependent variable:

Once again, the estimated coefficients of x₁ and x₂ match the earlier standardized coefficients.

Finally, we can view the standardized coefficients as what we get if literally standardize every variable in the regression model by subtracting the sample mean and dividing by the corresponding sample standard deviation. (The intercept vanishes from the model once we subtract the mean of the column of "ones".) In Eviews, with "ybar" denoting the sample average of the y variable, etc., we get:

Personally, it's this last version of the model that I prefer to think of as the basis for the so-called standardized coefficients. In addition, Andrew Gelman (2008) points out that subtracting the sample means to centre the data at zero makes interpretation of (any) interaction effects easier.

Beta coefficients were discussed briefly in Art Goldberger's Econometric Theory (1964; pp. 197-198). He observed that, "Although they are extensively used in psychological statistics, standardized variables and beta coefficients are rarely used in econometrics."

To this, I'd add:

They're also used in other disciplines in the social sciences.
Other econometrics packages (such as Stata), report beta coefficients.
This topic doesn't seem to get much attention (if any) in more recent econometrics text books.
Similar results apply if the regression model is nonlinear in the parameters.

My bottom line - I don;t actually use beta coefficients at all. I prefer to think in terms of marginal effects measured in terms of the original data units.

4 comments:

FrencholivierApril 15, 2015 at 7:57 PM
Very interesting. But how do you interpret the estimated coefficients when the data is I(1), in which case subtracting the mean in nonsense?
ReplyDelete
Replies
AnonymousApril 20, 2015 at 10:02 AM
Dear Dave ! how we can forecast annual values fro the future period in Eviews-9.
Please write a complete blog on forecasting techniques in E-Views by giving examples and showing E-Views work files
ReplyDelete
Replies
JagannadhNovember 15, 2016 at 1:02 AM
Very clearly explained. Thank you.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Tuesday, April 14, 2015

Regression Coefficients & Units of Measurement

4 comments: