Sunday, September 16, 2012

Confidence Regions for Regression Coefficients

Let’s consider the usual linear regression model, with the full set of assumptions:

                     y = Xβ + ε ;    ε ~ N[0 , σ2In] , (1)

where X is a non-random (n × k) matrix with full column rank.

Recall that, under our usual set of assumptions for the linear regression model, the OLS coefficient estimator,  b = (X'X)-1X'y, has the following sampling distribution:

               b ~ N[β ,σ2(X'X)-1] .          (2)

From the form of the covariance matrix for the b vector, we see that, in general:

(i) The leading diagonal elements will not all be the same, so each element of b will usually have a different variance.

(ii) There is no reason for the off-diagonal elements of the covariance matrix to be zero in value, so the elements of the b vector will be pair-wise correlated with each other.

You'll also remember that when we develop a confidence interval for one of the elements of β, say βi, we start off with the following probability statement:

                  Pr.[-tc < (bi - βi) / s.e.(bi) < tc] = (1 - α) ,        (3)

where tc is chosen to ensure that the desired probability of (1 - α) is achieved. Equation (3) is then re-written (equivalently) as:

                 Pr.[βi - tcs.e.(bi) < bi < βi +tcs.e.(bi)] = (1 - α),       (4)

and we then manipulate the event whose probability of occurrence we were interested in, until we ended up with the following random interval which, if constructed many, many times, would cover the true (but unobserved) βi, 100(1 - α)% of the time:

                 [bi - tcs.e.(bi) , bi + tcs.e.(bi)] .          (5)

Notice that this interval is centered at bi. Making the interval symmetric about this point ensures that we get the shortest (and hence most informative) interval for any fixed values of n, the sample size, and α. (See here and here for more details.)

Now, suppose that we want to generalize the concept of a confidence interval (that applies to a single element of b) to that of a confidence region, that can be associated with two elements of b at once.

Note that equation (4) is a statement that gives us the probability that the scalar random variable, bi, lies in some interval on the real line. So, if we are interested in two elements of b at once, consider a probability statement of the following type:

                          Pr.[(bi , bj)' is in R] = (1 - α)  . (6)

This is just a statement that gives us the probability that a random vector lies in some two-dimensional region, say R. Just as equation (4) can be manipulated to give us the confidence interval in equation (5), the statement in equation (6) can be manipulated to give us a confidence region that has the corresponding interpretation. That is, it will be region whose boundaries are random, and the interpretation will be that if we repeatedly construct such a region many, many times, then such regions will cover the true value of the vector, (βi , βj)' , 100(1 - α)% of the time.

Now, the question is, “what does such a region look like?”. Look back at the sampling distribution for the full b vector in equation (2), and comments (i) and (ii) that follow it. In addition, recall that if the full b vector follows this multivariate normal distribution, then all of the marginal distributions associated with its elements will also be normal. In particular, the sub-vector associated with the pair of elements that we are interested in will have the following bivariate normal sampling distribution:

                      (bi , bj)' ~ N[(βi , βj)' , V*] ,              (7)

where the elements in the (estimated) covariance matrix, V*,  come from the appropriate (2 × 2)  sub-matrix of σ2(X'X)-1,  in (2). The leading diagonal; elements of this sub-matrix will be var.(bi) and var.(bj), and the off-diagonal; elements will each be cov.(bi , bj ). Given points (i) and (ii) above, in general  var.(bi) ≠ var.(bj) and cov.(bi , bj ) ≠ 0.

Just as the univariate (scalar) normal random variable bi becomes a univariate Student-t random variable when we standardize it, and replace the unobserved s.d.(bi) with the observable s.e.(bi), the bivariate normal random variable  becomes a bivariate Student-t random variable when we essentially standardize each element and replace the unobserved cov.(bi , bj), s.d.(bi) and s.d.(bj) with the observable  côv(bi , bj), s.e.(bi) and s.e.(bj). (Of course, these quantities are obtained from the appropriate sub-matrix of .) Strictly speaking, what we do is to work with the (bi , bj)' vector in its entirety, and construct a new vector,

                           (bi* , bj*)' = V*-1/2(bi , bj)',           (8)

where V*is the estimated covariance matrix for (bi , bj)', and V*-1/2 satisfies the relationship V*-1/2V*-1/2 = V*-1.

So, in short, what we end up with in (8) is a random vector that follows a bivariate Student-t distribution, and note that the marginal distributions of a multivariate Student-t distribution are also Student-t distributed. This tells us that when we manipulate equation (6) to get a random confidence region, what we will end up with is a region that is based on the bivariate Student-t distribution. Also, just as the confidence interval in equation (5) was centered at the value of the point estimator, bi, our bivariate confidence region will be centered at the point located by the value taken by the vector point estimator, (bi , bj)'.

We can imagine what this region will look like if we think of the underlying density function in the same way that we did in the handout titled “Spherical Distributions” that went with a recent post. In that handout we saw that in the case of a bivariate normal density, with a fixed mean vector, the factors that really mattered were any differences between the variances of the two random variables, and the magnitude and sign of the covariance between them. The same is going to apply here when we consider the bivariate Student-t distribution. The variances in this case depend solely on the “degrees of freedom” parameter, which is just (n - k) in our case.

Here are a perspective plot and a contour plot for the bivariate Student-t distribution, with a zero mean vector, when each element of the random vector has the same variance (determined by the degrees of freedom, which I have set to 3) and the covariance between them is zero:

These, and the following plots were created using the fMultivar package in R. The code I used is on this blog's code page, here. (In fact, the code also generates dynamic "animated" views of the bivariate densities as they are rotated and viewed from various perspectives.)

As you would anticipate from the earlier blog post, if the two elements of the random vector have different variances and/or there is a non-zero covariance between them, the plots change. Specifically, the circular contours will become elliptical. The following two graphs relate to the case where the correlation is changed from 0.0 to -0.7, and you know already (from the earlier post) what would happen if the correlation were (say) 0.4:

Now, let’s apply all of this to the notion of a confidence region. Look at the last contour plot, and focus on the contour that is labeled 0.1. This is giving us a region within which 90% of the bivariate density lies. That single elliptical line marks out a (random) region that has 90% probability content. Notice that the elliptical contours are all “centered” at the point (0 , 0).

When we relate this to the construction of a confidence region for (β1 , β2)' in our regression model, we can see that once we choose the confidence level (say, 95%) we are concentrating on just one of the contours, and that the region will be “centered” at the point determined by (b1 , b2)' . This is what we see when we estimate a regression model with EViews, and then select View, Coefficient Diagnostics, Confidence Ellipse. We then have the opportunity to specify the confidence level, which coefficients are to be considered in a pair-wise manner, and how we want to display the individual confidence intervals for each individual coefficient:

In this last plot, we see that the confidence ellipse for a 95% confidence level is “centered” at the point (1.42, -0.007), which corresponds to the OLS estimates for the intercept and slope coefficients in the regression output above. If we repeated this exercise many, many times then 95% of the regions created would cover the true values of the intercept and slope coefficients in this model. Of course, we will never know if this particular regions does. Notice that the dotted straight vertical lines in the confidence ellipses plot give us the limits for a 95% confidence interval for just the intercept coefficient by itself. In this case this interval has a lower limit of just under 1.2, and an upper limit of just under 1.7. Using the OLS regression output above, you should be able to quickly determine the exact values for the limits of this interval. In the same manner, the two horizontal straight dotted lines give us the lower and upper limits for a 95% confidence interval for just the slope coefficient by itself. Again, you can use the OLS regression output to convince yourself that these limits are correct.

Note that as you increase the confidence level, the area of the confidence ellipse will increase, in the same way that a confidence interval becomes wider as you increase the confidence level, ceteris paribus. Finally, the direction in which the confidence ellipse is “sloped” in this example indicates that b1 and b2 must have a negative covariance. This is readily verified by selecting View, Covariance matrix and observing that the covariance is -0.000251:

The EViews workfile that  I used for this example is available on the code page for this blog, here, and the data are here.

Postscript ~ What would the confidence region look like if we were dealing with 3 coefficients?

© 2012, David E. Giles


  1. I would love to see a confidence "blimp" in a seminar one day! You could call it a holographic Hindenburg chart.


Note: Only a member of this blog may post a comment.