Econometrics Beat: Dave Giles' Blog: Least Squares, Perfect Multicollinearity, & Estimable Functions

Friday, September 19, 2014

Least Squares, Perfect Multicollinearity, & Estimable Functions

This post is essentially an extension of another recent post on this blog. I'll assume that you've read that post, where I discussed the problem of solving linear equations of the form Ax = y, when the matrix A is singular.

Let's look at how this problem might arise in the context of estimating the coefficients of a linear regression model, y = Xβ + ε. In the previous post, I said:

"Least squares estimation leads to the so-called "normal equations":

X'Xb = X'y . (1)

If the regressor matrix, X, has k columns, then (1) is a set of k linear equations in the k unknown elements of β. You'll recall that if X has full column rank, k, then (X'X) also has full rank, k, and so (X'X)^-1is well-defined. We then pre-multiply each side of (1) by (X'X)^-1, yielding the familiar least squares estimator for β, namely b = (X'X)^-1X'y.

So, as long as we don't have "perfect multicollinearity" among the regressors (the columns of X), we can solve (1), and the least squares estimator is defined. More specifically, a unique estimator for each individual element of β is defined.

What if there is perfect multicollinearity, so that the rank of X, and of (X'X), is less than k? In that case, we can't compute (X'X)^-1, we can't solve the normal equations in the usual way, and we can't get a unique estimator for the (full) β vector."

I promised that I'd come back to the statement, "we can't get a unique estimator for the (full) β vector". Now's the time to do that.

What we're going to be concerned with is solving the normal equations for b, in the case where (X'X) is singular - it has less than full rank. What we saw in the previous post was that we can make some progress with this problem by considering a "generalized inverse" of (X'X), rather the "regular inverse (which is not defined in this case).

The best way to think about multicollinearity in a regression setting is that it reflects a shortage of information. Sometimes additional information can be obtained via additional data. Sometimes we can "inject" additional information into the problem by means of exact or stochastic restrictions on the parameters. (The latter is how the problem is avoided in a Bayesian setting.) Sometimes, we can't do either of these things.

Here, I'll focus on the most extreme case possible - one where we have "perfect multicollinearity". That's the case where X has less than full rank, so that (X'X) doesn't have a regular inverse. It's the situation outlined above.

For the least squares estimator, b, to be defined, we need to be able to solve the normal equation, (1). What we're interested in, of course, is a solution for every element of the b vector. This is simply not achievable in the case of perfect multicollinearity. There's not enough information in the sample for us to be able to uniquely identify and estimate every individual regression coefficient. However, we should be able to identify and estimate certain linear combinations of those coefficients. These combinations are usually referred to as "estimable functions" of the parameters.

Let's think about a simple example. Suppose that we were foolish enough to try and estimate the following regression model by least squares:

y = β₀ + β₁x₁ + β₂ x₂ + β₃ (x₁ + x₂) + ε . (2)

The third regressor, (x₁ + x₂), is a linear combination of two other regressors, so although the X matrix has 4 columns (allowing for the column of one's for the intercept), it has a rank of 3. So, (X'X) is singular, and the least squares estimator of the coefficient vector can't be computed in the usual way. However, notice that we can re-write (2) as:

y = β₀ + (β₁+ β₃) x₁ + (β₂+ β₃) x₂ + ε . (3)

Now the X matrix has just 3 columns, and it has full column rank. Least squares estimation is feasible. However, what we'll get estimates of are β₀ , (β₁+ β₃) , and (β₂+ β₃). We won't get separate estimates of β₂and β₃themselves. Here, the estimable functions of the parameters are β₀ , (β₁+ β₃) , and (β₂+ β₃).

We can relate all of this to the use of a generalized inverse, as discussed in the earlier post. I'll illustrate the connection using some simple matrix manipulations in EViews. The associated workfile and data are on the code and data pages for this blog.

Using some arbitrary data for y, x₂ and x₃, I've estimated the following model by least squares:

y = γ₀ + γ₁x₁ + γ₂ x₂ + v . (4)

Here are my results:

Notice that γ₀, γ₁, and γ₂in (4) correspond to (the estimable functions of the parameters) β₀, (β₁+ β₃), and (β₂+ β₃) in equation (3).

Next, I've used some simple matrix commands to construct the X matrix associated with the model in (2), and to use the Moore-Penrose generalized inverse of (X'X) to get a solution to the associated normal equations.

The results that I get are:

These four values are not parameter estimates. They comprise a particular set of solutions to the normal equations for model (2). However, notice the following.

The least squares estimate of γ₀is the same as the solution for β₀, namely -0.083283.
The least squares estimate of γ₁is the same as the sum of the solutions for β₁and β₃. That is, -0.04293 = (-0.07437 + 0.03144).
The least squares estimate of γ₂is the same as the sum of the solutions for β₂and β₃. That is, 0.137355 = (0.105814 + 0.031441).

In other words, in the least squares results shown above for model (4) we've been able to estimate the estimable functions of the parameters, as identified in equation (3).

You might wonder, can we construct a set of results about the statistical properties of the solution to the normal equations that I've obtained here? In other words, is there a set of results about the estimators of the estimable functions that corresponds to the usual results about the lease squares estimator of β in the full-rank case?

The answer is "yes". The full statistical analysis of the less-than-full-rank model is well-established, and well-known. If you want to follow up on this, a great place to start is with Searle (1971).

Reference

Searle, S. R., 1971. Linear Models. Wiley, New York.

5 comments:

Martin SandersSeptember 29, 2014 at 10:51 PM
Very, very interesting, Prof. Giles, thank you. Though I think the use for interpretation is somewhat limited as it seems to me that the estimate for $\beta_{3}$ could be any number as long as the other two betas are "adjusted" accordingly. I will take a look in Searle, however.
ReplyDelete
Replies
UnknownOctober 16, 2014 at 10:38 AM
I think that the Moore Penrose 'solution' in a dummy variable representation of a categorical variable is for the sum of the dummies to equal the 'intercept' - C in Eviews. Is that correct? Proof? Interesting - the ANOVA usually is that the sum is the negative of 'C' or the sum of all the coefficients=0. Bob Parks, WU St Louis

Example (Studenmund 6th ed. pgs 76-88 Woody example)

Var MPenrose exclude
C 43995.72 18415.63
P 0.35 0.35
I 1.54 1.54
N2 40010.90 65590.99
N3 26438.20 52018.29
N4 14481.10 40061.19
N5 -1899.02 23681.07
N6 4917.72 30497.81
N7 -12033.12 13546.97
N8 -2339.97 23240.12
N9 -25580.09
Sum N2 to N9 = 43995.72
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Friday, September 19, 2014

Least Squares, Perfect Multicollinearity, & Estimable Functions

5 comments: