Let’s consider the usual linear regression model, with the full set of assumptions:

y = XÎ² + Îµ ; Îµ ~ N[0 , Ïƒ^{2}I_{n}] , (1)

where X is a non-random (n × k) matrix with full column rank.

Recall that, under our usual set of assumptions for the linear regression model, the OLS coefficient estimator, b = (X'X)^{-1}X'y, has the following sampling distribution:

b ~ N[Î² ,Ïƒ^{2}(X'X)^{-1}] . (2)

From the form of the covariance matrix for the b vector, we see that, in general:

(i) The leading diagonal elements will __not__ all be the same, so each element of b will usually have a different variance.

(ii) There is no reason for the off-diagonal elements of the covariance matrix to be zero in value, so the elements of the b vector will be pair-wise correlated with each other.

You'll also remember that when we develop a confidence interval for one of the elements of Î², say Î²_{i}, we start off with the following probability statement:

Pr.[-t_{c} < (b_{i} - Î²_{i}) / s.e.(b_{i}) < t_{c}] = (1 - Î±) , (3)

where t_{c} is chosen to ensure that the desired probability of (1 - Î±) is achieved. Equation (3) is then re-written (equivalently) as:

Pr.[Î²_{i} - t_{c}s.e.(b_{i}) < b_{i} < Î²_{i} +t_{c}s.e.(b_{i})] = (1 - Î±), (4)

and we then manipulate the event whose probability of occurrence we were interested in, until we ended up with the following random interval which, if constructed many, many times, would cover the true (but unobserved) Î²_{i}, 100(1 - Î±)% of the time:

[b_{i} - t_{c}s.e.(b_{i}) , b_{i} + t_{c}s.e.(b_{i})] . (5)

Notice that this interval is centered at b

_{i}. Making the interval symmetric about this point ensures that we get the shortest (and hence most informative) interval for any fixed values of n, the sample size, and Î±. (See

**here **and

** here** for more details.)

Now, suppose that we want to generalize the concept of a confidence *interval* (that applies to a single element of b) to that of a confidence* region*, that can be associated with two elements of b at once.