Let’s consider the usual linear regression model, with the full set of assumptions:
y = Xβ + ε ; ε ~ N[0 , σ2In] , (1)
where X is a non-random (n × k) matrix with full column rank.
Recall that, under our usual set of assumptions for the linear regression model, the OLS coefficient estimator, b = (X'X)-1X'y, has the following sampling distribution:
b ~ N[β ,σ2(X'X)-1] . (2)
From the form of the covariance matrix for the b vector, we see that, in general:
(i) The leading diagonal elements will not all be the same, so each element of b will usually have a different variance.
(ii) There is no reason for the off-diagonal elements of the covariance matrix to be zero in value, so the elements of the b vector will be pair-wise correlated with each other.
You'll also remember that when we develop a confidence interval for one of the elements of β, say βi, we start off with the following probability statement:
Pr.[-tc < (bi - βi) / s.e.(bi) < tc] = (1 - α) , (3)
where tc is chosen to ensure that the desired probability of (1 - α) is achieved. Equation (3) is then re-written (equivalently) as:
Pr.[βi - tcs.e.(bi) < bi < βi +tcs.e.(bi)] = (1 - α), (4)
and we then manipulate the event whose probability of occurrence we were interested in, until we ended up with the following random interval which, if constructed many, many times, would cover the true (but unobserved) βi, 100(1 - α)% of the time:
[bi - tcs.e.(bi) , bi + tcs.e.(bi)] . (5)
Notice that this interval is centered at bi. Making the interval symmetric about this point ensures that we get the shortest (and hence most informative) interval for any fixed values of n, the sample size, and α. (See here and here for more details.)
Now, suppose that we want to generalize the concept of a confidence interval (that applies to a single element of b) to that of a confidence region, that can be associated with two elements of b at once.