Econometrics Beat: Dave Giles' Blog: When Will the Adjusted R-Squared Increase?

The coefficient of determination (R²) and t-statistics have been the subjects of two of my posts in recent days (here and here). There's another related result that a lot of students don't seem to get taught. This one is to do with the behaviour of the "adjusted" R² when variables are added to or deleted from an OLS regression model.

We all know, and it's trivial to prove, that the addition of any variable to such a regression model cannot decrease the R² value. In fact, R² will increase with such an addition to the model in general. Conversely, deleting any regressor from an OLS regression model cannot increase (and will generally reduce) the value of R².

Indeed, this is precisely why various "adjusted" R² measures have been suggested over the years. You can boost the goodness-of-fit of the model by throwing anything into the regression, whether it makes economic sense or not.

The adjusted R² that we typically use involves "correcting" both the numerator and denominator sums of squares (in the usual R² formula) for the appropriate degrees of freedom. If the sample size is "n", and the model includes "k" regressors (including the intercept) this adjusted R² can be expressed as:

R_A² = 1 - [(n - 1) / (n - k)][1 - R²] ,

and it can be shown that R_A² ≤ R² ≤ 1. The adjusted R² can take negative values, and this will occur if and only if R² ≤ [(k - 1) / (n - 1)].

Now, what can we say about the behaviour of R_A² if we add regressors or delete them from the model? The adjusted R² may increase or decrease (or stay the same) when we do this, and there are some simple conditions that determine which will occur.

The first result is that adding a regressor will increase (decrease) R_A² depending on whether the absolute value of the t-statistic associated with that regressor is greater (less) than one in value. R_A² is unchanged if that absolute t-statistic is exactly equal to one.

If you drop a regressor from the model, the converse of the above result applies. Dropping a regressor amounts to imposing a (zero) restriction on its coefficient. If you square the t-statistic, you get an F-statistic, and it's exactly the F-statistic for testing if the single linear restriction is valid. Not surprisingly, then, there's a more general result than the one given above - one that applies to a situation where several regressors are simultaneously added to or dropped from the model.

Adding a group of regressors to the model will increase (decrease) R_A² depending on whether the F-statistic for testing that their coefficients are all zero is greater (less) than one in value. R_A² is unchanged if that F-statistic is exactly equal to one.

So, you can increase the adjusted coefficient of determination by adding regressors that are statistically insignifcant, but the situation isn't quite as bad as with the usual (uadjusted) R².

Finally, the second result given above generalizes to the case where we are considering any set of linear restrictions on the regression coefficients - not just zero restrictions.

In summary, and not too surprisingly, the behaviour of the adjusted coefficient of determination as we add or delete regressors is quite systematic. If you're a student who's hoping that deleting a regressor with a t-statistic of 1.1 will increase the value of R_A², think again!

13 comments:

AnonymousMay 4, 2013 at 12:59 AM
Greene "Econometric Analysis" Chapter 3, Exercise 8.
Walt FrenchMay 4, 2013 at 6:19 AM
Great little series.

I'm troubled by the last sentence, though: “If you're a student who's hoping that deleting a regressor with a t-statistic of 1.1 will increase the value of RA2, think again!”

Fiddling with the list of regressors seems like a GREAT way to exhaust your degrees of freedom without realizing such. In some of my equations with inherently low power I ding the d.f. by one for subsequent regressions AFTER I've tried and removed a regressor. That seems consistent with the spirit of better assessing how the equations will work out of sample.
AnonymousJuly 2, 2013 at 3:50 PM
Great post. I'd love to see the proofs of these results discussed in your very clear style.
UnknownNovember 7, 2013 at 10:55 AM
Hi Dave,

Thanks by the post.
Can you clarify a doubt? What's the problem with model when R² reached a value greater than 1? Is there many predictors?
Thanks in advance.
Amom
UnknownApril 8, 2014 at 5:26 PM
Hi Dave,
I was wondering if you did post a proof for this.
NamitMay 29, 2014 at 11:49 AM
Hi Dave,
I ran multiple regression model.I had a variable which had very low correlation with dependent and scatter plot was showing no trend .When did regression only with this variable as expected it came insignificant.
Now when did with other variables it comes significant.Also when drop it the model r square drops significantly by 8 % .
1.Can you hypothesise what's happening 2. should I throw away variable causing incremental r square 8% just coz its insignificant,scatter plot no trend plus poor corr coeff

Note: Only a member of this blog may post a comment.

Econometrics Beat: Dave Giles' Blog

Pages

Friday, May 3, 2013

When Will the Adjusted R-Squared Increase?

13 comments: