Friday, May 3, 2013

When Will the Adjusted R-Squared Increase?

The coefficient of determination (R2) and t-statistics have been the subjects of two of my posts in recent days (here and here). There's another related result that a lot of students don't seem to get taught. This one is to do with the behaviour of the "adjusted" R2 when variables are added to or deleted from an OLS regression model.

We all know, and it's trivial to prove, that the addition of any variable to such a regression model cannot decrease the R2 value. In fact, R2 will increase with such an addition to the model in general. Conversely, deleting any regressor from an OLS regression model cannot increase (and will generally reduce) the value of R2.

Indeed, this is precisely why various "adjusted" R2 measures have been suggested over the years. You can boost the goodness-of-fit of the model by throwing anything into the regression, whether it makes economic sense or not.

The adjusted R2 that we typically use involves "correcting" both the numerator and denominator sums of squares (in the usual R2 formula) for the appropriate degrees of freedom. If the sample size is "n", and the model includes "k" regressors (including the intercept) this adjusted R2 can be expressed as:

                        RA2 = 1 - [(n - 1) / (n - k)][1 - R2] ,

and it can be shown that RA2 ≤ R2 ≤ 1. The adjusted R2 can take negative values, and this will occur if and only if R2 ≤ [(k - 1) / (n - 1)].

Now, what can we say about the behaviour of RA2 if we add regressors or delete them from the model? The adjusted R2 may increase or decrease (or stay the same) when we do this, and there are some simple conditions that determine which will occur.

The first result is that adding a regressor will increase (decrease) RA2 depending on whether the absolute value of the t-statistic associated with that regressor is greater (less) than one in value. RA2 is unchanged if that absolute t-statistic is exactly equal to one.

If you drop a regressor from the model, the converse of the above result applies. Dropping a regressor amounts to imposing a (zero) restriction on its coefficient. If you square the t-statistic, you get an F-statistic, and it's exactly the F-statistic for testing if the single linear restriction is valid. Not surprisingly, then, there's a more general result than the one given above - one that applies to a situation where several regressors are simultaneously added to or dropped from the model.

Adding a group of regressors to the model will increase (decrease) RA2 depending on whether the F-statistic for testing that their coefficients are all zero is greater (less) than one in value. RA2 is unchanged if that  F-statistic is exactly equal to one.

So, you can increase the adjusted coefficient of determination by adding regressors that are statistically insignifcant, but the situation isn't quite as bad as with the usual (uadjusted) R2.

Finally, the second result given above generalizes to the case where we are considering any set of linear restrictions on the regression coefficients - not just zero restrictions.

In summary, and not too surprisingly, the behaviour of the adjusted coefficient of determination as we add or delete regressors is quite systematic. If you're a student who's hoping that deleting a regressor with a t-statistic of 1.1 will increase the value of RA2, think again!


© 2013, David E. Giles

14 comments:

  1. Greene "Econometric Analysis" Chapter 3, Exercise 8.

    ReplyDelete
  2. Great little series.

    I'm troubled by the last sentence, though: “If you're a student who's hoping that deleting a regressor with a t-statistic of 1.1 will increase the value of RA2, think again!”

    Fiddling with the list of regressors seems like a GREAT way to exhaust your degrees of freedom without realizing such. In some of my equations with inherently low power I ding the d.f. by one for subsequent regressions AFTER I've tried and removed a regressor. That seems consistent with the spirit of better assessing how the equations will work out of sample.

    ReplyDelete
  3. Great post. I'd love to see the proofs of these results discussed in your very clear style.

    ReplyDelete
  4. Hi Dave,

    Thanks by the post.
    Can you clarify a doubt? What's the problem with model when R² reached a value greater than 1? Is there many predictors?
    Thanks in advance.
    Amom

    ReplyDelete
    Replies
    1. This can't happen if you're using OLS with an intercept in the model.

      Delete
  5. Hi Dave,
    I was wondering if you did post a proof for this.

    ReplyDelete
    Replies
    1. Zeba - I overlooked this. Once I've finished grading the current exams, I'll post a short proof.

      Delete
    2. Zeba - see my new post, here: http://davegiles.blogspot.ca/2014/04/proof-of-result-about-adjusted.html

      Delete
  6. Hi Dave,
    I ran multiple regression model.I had a variable which had very low correlation with dependent and scatter plot was showing no trend .When did regression only with this variable as expected it came insignificant.
    Now when did with other variables it comes significant.Also when drop it the model r square drops significantly by 8 % .
    1.Can you hypothesise what's happening 2. should I throw away variable causing incremental r square 8% just coz its insignificant,scatter plot no trend plus poor corr coeff

    ReplyDelete
    Replies
    1. Sounds like a case of multicollinearity to me. I bet the regressor in question is highly correlated with the other regressors. Other than that I can't tell without looking at your data - and I'm afraid I don't have time to do that.

      Delete