Econometrics Beat: Dave Giles' Blog: Unbiased Model Selection Using the Adjusted R-Squared

Saturday, August 3, 2013

Unbiased Model Selection Using the Adjusted R-Squared

The coefficient of determination (R²), and its "adjusted" counterpart, really don't impress me much! I often tell students that this statistic is one of the last things I look at when appraising the results of estimating a regression model.

Previously, I've had a few things to say about this measure of goodness-of-fit (e.g., here and here). In this post I want to say something positive, for once, about "adjusted" R². Specifically, I'm going to talk about its use as a model-selection criterion.

I decided to prepare this particular post as a result of some comments/questions that came from one particular reader of my recent piece, Information Criteria Unveiled. The question was, "what do we mean when we talk about a model-selection criterion being unbiased?"

I think I finally responded adequately, but I promised to put together a follow-up post with more information. A good way of illustrating the concept in question is to see why choosing between alternative regression model specifications, by maximizing the adjusted R², can be described as an "unbiased" model-selection criterion. What follows is due, originally, to Theil (1957).

Suppose we have two linear regression models, each explaining the same dependent variable, y:

M₁: y = X₁β₁ + ε₁

M₂: y = X₂β₂ + ε₂.

In each case, the same sample of n observations is available for y. Suppose that X₁ and X₂ are each non-stochastic, of full rank, k₁ and k₂ respectively. Finally, suppose that ε₁ has a zero mean, is serially independent, and homoskedastic (with a variance of σ₁²).

Notice that no assumptions are being made about the error term in M₂, namely ε₂. So, what we are doing here is setting up M₁ to be the data-generating process (or "true model"), while M₂ is a "false model".

Now, recall that the adjusted coefficient of determination is the quantity,

R_A² = 1 - [e'e / (n - k)] / s_y²,

where s_y² is the (unbiased) sample variance of the y data, and e is the OLS residual vector.

Clearly, for a given sample of y data, R_A² increases monotonically as [e'e /(n - k)] decreases. So, choosing the model with the larger R_A² value is equivalent to choosing the model with the smaller value of s² = [e'e /(n - k)].

Let's focus on this latter quantity, in the case of M₂:

s₂² = (e'e) / ( n - k₂) = (y' P₂y) / (n - k₂) ; where P₂ = I - X₂(X₂'X₂)^-1X₂' .

So,

(n - k₂)s₂² = (X₁β₁ + ε₁)' P₂(X₁β₁ + ε₁) (*)

= β₁'X₁'P₂X₁β₁ + 2β₁X₁'P₂ε₁ + ε₁'P₂ε₁

≥ 2β₁X₁'P₂ε₁ + ε₁'P₂ε₁,

because P₂ is idempotent, and X₁ has full rank.

Then, using the results that E[ε₁] = 0, and E[ε₁ε₁'] = σ₁²I_n, we have:

E[(n - k₂)s₂² ] ≥ E[ε₁'P₂ε₁]
= E{tr.[ε₁'P₂ε₁]}
= tr.(P₂)E[[ε₁ε₁']
= tr.(P₂)σ₁²I_n
= σ₁²(n - k₂).
So,
E[s₂²] ≥ σ₁² = E[s₁²] .

In other words, if we choose the smaller s² (or, larger R_A²), we'll select the true model, M₁, on average.

It's in this particular sense that the "maximize R_A²" rule is an unbiased model-selection criterion.

Now, there are (at least) three things to notice about the derivation of this result:

One of the models under consideration had to be the true model, in the sense that its error term was "well-behaved".
If we'd replaced y with (X₂β₂ + ε₂) at line (*), this would have been correct, but totally unhelpful, as we know nothing about the properties of ε₂.
If the columns of X₂ include all of the columns of X₁, then P₂X₁ = 0, and we'd end up with the result that E[s₂²] = σ₁² = E[s₁²] . Of course, in this case of "nested" models, presumably we'd select between them by testing the restrictions that make M₂ collapse to M₁.

This basic result has been extended in several different directions by various authors over the years. For example, Kloek (1975) shows that minimizing s² is a strongly consistent model-selection criterion, as long as the correct model specification is one of those being considered; and Schmidt (1974) shows that it is a weakly consistent selection criterion when the models have autocorrelated errors. Giles and Smith (1977) prove that this selection rule retains its property of unbiasedness when there are exact linear restrictions on the models' parameters; and it retains its property of weak consistency if, in addition, the models' errors are autocorrelated.

Giles and Sturmfels (1979) show that Klein's "unbiased model selection" result holds if the regressors are perfectly correlated, and a generalized inverse is used to estimate "estimable functions" of the parameters of the models being compared. Finally, Giles and Low (1981) prove that minimizing s² is a weakly consistent model-selection rule if models with random regressors are estimated using the method of Instrumental Variables.

So, indeed, there is some basis for choosing among competing regression models on the basis of a large (adjusted) R² value.

As you might have guessed, though, it's not all good news!

Apart from the very strong requirement that the true model specification has to among those considered, getting things right, on average, isn't necessarily much comfort! The situation is analogous to that of having an unbiased estimator - which may have a very large variance.

Schmidt (1973) and Ebberler (1975) have explored the probability of selecting the true model by using the "maximize adjusted R²" rule. The distribution of R_A² depends (among other things) on the regressors that appear in the models - see my post here. So, only illustrative results can be obtained for these probabilities. The results are not particularly encouraging. As you'd no doubt guess, you can select the correct model on average, but the probability of making a mistake can be quite high!

References

Ebberler, D. H., 1975. On the probability of correct model selection using the maximum (adjusted) R2 choice criterion. International Economic Review, XVI, 516-520.

Giles, D. E. A. and C. K. Low, 1981. Choosing between alternative structural equations estimated by instrumental variables. Review of Economics and Statistics, LXIII, 476-478.

Giles, D. E. A. and R. G. Smith, 1977. A Note on the minimum error variance rule and the restricted regression model. International Economic Review, 18, 247-251.

Giles, D. E. A. and B. M. Sturmfels, 1979. Choosing between rank-deficient restricted models. New Zealand Economic Papers, 13, 202-210.

Kloek, T., 1975. Note on a large-sample result in specification analysis. Econometrica, 43, 933-936.

Schmidt, P., 1973. Calculating the power of the minimum standard error choice criterion. International Economic Review, XIV, 253-255.

Schmidt, P., 1974. A note on Theil's minimum standard error criterion when the disturbances are autocorrelated. Review of Economics and Statistics, LVI, 122-123.

Theil, H., 1957. Specification errors and the estimation of econometric relationships. Review of the International Statistical Institute, 25, 41-51.

2 comments:

AnonymousAugust 5, 2013 at 10:24 AM
Item 3 cannot be emphasized enough. Otherwise, the statement that it is a consistent procedure can be rather misleading, because, without that condition, it would be inconsistent with the definition of consistency in more recent literature (see Shao (1997) for example)
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Saturday, August 3, 2013

Unbiased Model Selection Using the Adjusted R-Squared

2 comments: