Econometrics Beat: Dave Giles' Blog: More on the Properties of the "Adjusted" Coefficient of Determination

A while back I wrote about the fact that R² (the coefficient of determination for a linear regression model) is a sample statistic, and as such it has a sampling distribution. In that post and in follow-up posts here and here, I discussed some of the properties of that sampling distribution, and about the mean and variance of R² in certain circumstances.

Let's take that discussion a step further by comparing the MSE's of R² and its "adjusted" counterpart.

First, let's be clear about the framework for this discussion, and assumptions that I'll be using. We'll be dealing with a linear regression model, with a full-rank non-random regressor matrix (that includes an intercept), and with errors that are serially independent, homoskedastic, and normally distributed. That is:

y = Xβ + ε ; ε ~ N[0 , σ²I_n] .

The coefficient of determination can be expressed in various (equivalent) ways. Let's write it as:

R² = 1 - (e'e) / (y*'y*) ,

where y* is the y vector, but expressed as deviations about the sample mean; and e is the OLS residual vector, e = y - Xb, where b = (X'X)^-1X'y.

The "adjusted" R² is:

R_A² = 1 - [(e'e ) / (n - k)] / [(y*'y*) / (n - 1)],

where k is the number of regressors (including the intercept).

Each of the sums of squares in the original R² formula is divided by the appropriate degrees of freedom. You'll recall the following results:

R_A² ≤ R².
Unlike R², R_A² can take negative values.
Although R² cannot decrease if we add a regressor to the model, R_A² will decrease if the (usual) t-statistic associated with that regressor is less than one in absolute value. (See here and here.)

Now, note that the relationship between R² and R_A² can be written in various ways, including:

R_A² = R² - (1 - R²)(k - 1) / (n - k) (1)

In one of the earlier, related posts I showed that the following results hold in the special situation where there is no linear relationship between y and the (non-intercept) regressors:

E[R²] = (k - 1) / (n - 1) (2)

Var.[R²] = [(k - 1)(n - k)] / [n (n - 1)²] (3)

These results were obtained by exploiting the relationship between R² and the F-statistic that we use to test the joint significance of the regressors. Notice that when the null hypothesis for this F-test is true (and there is no linear relationship), the population coefficient of determination is zero.

So, we see that the usual sample R² is an upwards-biased estimator of the population R² (in this special case), and its MSE is:

MSE[R²] = [k(k - 1) / n(n - 1)] . (4)

Using the results in (2) and (3), it follows immediately from (1) that:

E[R_A²] = 0 , (5)

Var.[R_A²] = (k - 1) / [n(n -k)] , (6)

and
MSE[R_A²] = Var.[R_A²] = (k - 1) / [n(n -k)] . (7)

From (5), we can see that "adjusting" the coefficient of determination eliminates the bias associated with R² (when there is no linear relationship between y and the regressors). Comparing the variances of R² and R_A², we can see that the elimination of this bias comes at the expense of increased variability, because Var.[R²] ≤ Var.[R_A²] .

To see that this inequality holds, note that

Var.[R²] / Var.[R_A²] = [(n - k) / (n - 1)]² ≤ 1.

Comparing the MSE's of R² and R_A², given in (4) and (7), we have the following result:

Δ = MSE[R²] - MSE[R_A²] = (k - 1)[k(n - k) - (n - 1)] / [n(n - k)(n - 1)] ,

and

sgn(Δ) = sgn[k(n - k) - (n - 1)] = sgn[k(n - k) - (n - 1) - k(n - 1) + k(n - 1)] = sgn[(k - 1)(n - k - 1)].

This expression is non-negative, if (n - k) ≥ 1, and so as long as we have at least one degree of freedom, the "adjusted" coefficient of determination has smaller MSE than the usual (sample) R², when the true population R² is zero.

In addition, notice than both of these MSE's go to zero as n increases, so both of these goodness-of-fit measures are mean-square consistent, and hence weakly consistent of the unobserved population R².

When there is a linear relationship between y and the regressors in our model, the F-statistic noted above has a non-central F distribution; R²and R_A² can be written as functions of a non-central Beta statistic (see here); and the associated non-centrality parameter is a function of the X data and the true values of all of the parameters in the regression model.

In this case any bias, variance, and MSE comparisons between the unadjusted and adjusted coefficients of determination will depend on the values of all of these quantities, not all of which are observable, of course!

2 comments:

DaumantasApril 24, 2018 at 2:47 AM
Great post! I was looking for some tips on which of the two, $R^2$ or $R^2_{adj}$ is a better estimator (say, in MSE sense) of the population $R^2$, and here it is. Of course, the really interesting case is when at least some (if not all) of the regressors truly belong in the model, but I understood that there is no general result (that would not depend on the true slope coefficients) for that case. On a side note, it does not seem you have made use of the normality assumption anywhere in the derivation. If so, why include it at all? And if you include it, a note on its irrelevance could be handy. It is my perception that too many people do not realize how little the normality assumption matters in deriving the standard results for OLS estimators. In any case, thank you for the great post!

Note: Only a member of this blog may post a comment.

Econometrics Beat: Dave Giles' Blog

Pages

Thursday, May 15, 2014

More on the Properties of the "Adjusted" Coefficient of Determination

2 comments: