## Friday, September 19, 2014

### Least Squares, Perfect Multicollinearity, & Estimable Functions

This post is essentially an extension of another recent post on this blog. I'll assume that you've read that post, where I discussed the problem of solving linear equations of the form Ax = y, when the matrix A is singular.

Let's look at how this problem might arise in the context of estimating the coefficients of a linear regression model, y = Xβ + ε. In the previous post, I said:
"Least squares estimation leads to the so-called "normal equations":

X'Xb = X'y  .                                                                (1)

If the regressor matrix, X, has k columns, then (1) is a set of k linear equations in the k unknown elements of β. You'll recall that if X has full column rank, k, then (X'X) also has full rank, k, and so (X'X)-1 is well-defined. We then pre-multiply each side of (1) by (X'X)-1, yielding the familiar least squares estimator for β, namely b = (X'X)-1X'y.
So, as long as we don't have "perfect multicollinearity" among the regressors (the columns of X), we can solve (1), and the least squares estimator is defined. More specifically, a unique estimator for each individual element of β is defined.
What if there is perfect multicollinearity, so that the rank of X, and of (X'X), is less than k? In that case, we can't compute (X'X)-1, we can't solve the normal equations in the usual way, and we can't get a unique estimator for the (full) β vector."
I promised that I'd come back to the statement, "we can't get a unique estimator for the (full) β vector". Now's the time to do that.

What we're going to be concerned with is solving the normal equations for b, in the case where (X'X) is singular - it has less than full rank. What we saw in the previous post was that we can make some progress with this problem by considering a "generalized inverse" of (X'X), rather the "regular inverse (which is not defined in this case).

The best way to think about multicollinearity in a regression setting is that it reflects a shortage of information. Sometimes additional information can be obtained via additional data. Sometimes we can "inject" additional information into the problem by means of exact or stochastic restrictions on the parameters. (The latter is how the problem is avoided in a Bayesian setting.) Sometimes, we can't do either of these things.

Here, I'll focus on the most extreme case possible - one where we have "perfect multicollinearity". That's the case where X has less than full rank, so that (X'X) doesn't have a regular inverse. It's the situation outlined above.

For the least squares estimator, b, to be defined, we need to be able to solve the normal equation, (1). What we're interested in, of course, is a solution for every element of the b vector. This is simply not achievable in the case of perfect multicollinearity. There's not enough information in the sample for us to be able to uniquely identify and estimate every individual regression coefficient. However, we should be able to identify and estimate certain linear combinations of those coefficients. These combinations are usually referred to as "estimable functions" of the parameters.

Let's think about a simple example. Suppose that we were foolish enough to try and estimate the following regression model by least squares:

y = β0 + βx1 + β2 x2 + β3 (x1 + x2) + ε  .                                             (2)

The third regressor, (x1 + x2), is a linear combination of two other regressors, so although the X matrix has 4 columns (allowing for the column of one's for the intercept), it has a rank of 3. So, (X'X) is singular, and the least squares estimator of the coefficient vector can't be computed in the usual way. However, notice that we can re-write (2) as:

y = β0 + (β1 + β3) x1 + (β+ β3) x2 + ε  .                                             (3)

Now the X matrix has just 3 columns, and it has full column rank. Least squares estimation is feasible. However, what we'll get estimates of are  β0 , (β+ β3) , and (β+ β3). We won't get separate estimates of βand βthemselves. Here, the estimable functions of the parameters are β0 , (β+ β3) , and (β+ β3).

We can relate all of this to the use of a generalized inverse, as discussed in the earlier post. I'll illustrate the connection using some simple matrix manipulations in EViews. The associated workfile and data are on the code and data pages for this blog.

Using some arbitrary data for y, x2 and x3, I've estimated the following model by least squares:

y = γ0 + γx1 + γ2 x2 + v  .                                                                 (4)

Here are my results:

Notice that γ0, γ1, and γin (4) correspond to (the estimable functions of the parameters) β0, (β+ β3), and (β+ β3) in equation (3).

Next, I've used some simple matrix commands to construct the X matrix associated with the model in (2), and to use the Moore-Penrose generalized inverse of (X'X) to get a solution to the associated normal equations.

The results that I get are:

These four values are not parameter estimates. They comprise a particular set of solutions to the normal equations for model (2). However, notice the following.
• The least squares estimate of γis the same as the solution for β0, namely -0.083283.
• The least squares estimate of γ1 is the same as the sum of the solutions for β1 and β3. That is, -0.04293 = (-0.07437 + 0.03144).
• The least squares estimate of γ2 is the same as the sum of the solutions for β2 and β3. That is, 0.137355 = (0.105814 + 0.031441).
In other words, in the least squares results shown above for model (4) we've been able to estimate the estimable functions of the parameters, as identified in equation (3).

You might wonder, can we construct a set of results about the statistical properties of the solution to the normal equations that I've obtained here? In other words, is there a set of results about the estimators of the estimable functions that corresponds to the usual results about the lease squares estimator of β in the full-rank case?

The answer is "yes". The full statistical analysis of the less-than-full-rank model is well-established, and well-known. If you want to follow up on this, a great place to start is with Searle (1971).

Reference

Searle, S. R., 1971. Linear Models. Wiley, New York.

1. Very, very interesting, Prof. Giles, thank you. Though I think the use for interpretation is somewhat limited as it seems to me that the estimate for $\beta_{3}$ could be any number as long as the other two betas are "adjusted" accordingly. I will take a look in Searle, however.

1. THey're unique if you are using the Moore-Penrose inverse.

2. Well, due to this specific procedure, I agree and I think it is nice to see that the procedure works. My point is just that a statistical package could estimate the model without the multicollinear variable, save the set of coefficients, then add the multicollinear variable, assign some arbitrary coefficient to it and adjust the the other coefficients accordingly given the results from the full-rank estimation. For meaningful interpretation, it must be done what you did in the last three bullet points anyway.

2. I think that the Moore Penrose 'solution' in a dummy variable representation of a categorical variable is for the sum of the dummies to equal the 'intercept' - C in Eviews. Is that correct? Proof? Interesting - the ANOVA usually is that the sum is the negative of 'C' or the sum of all the coefficients=0. Bob Parks, WU St Louis

Example (Studenmund 6th ed. pgs 76-88 Woody example)

Var MPenrose exclude
C 43995.72 18415.63
P 0.35 0.35
I 1.54 1.54
N2 40010.90 65590.99
N3 26438.20 52018.29
N4 14481.10 40061.19
N5 -1899.02 23681.07
N6 4917.72 30497.81
N7 -12033.12 13546.97
N8 -2339.97 23240.12
N9 -25580.09
Sum N2 to N9 = 43995.72

1. Bob - I believe you're right.