Econometrics Beat: Dave Giles' Blog: All About Spherically Distributed Regression Errors

Thursday, May 2, 2013

All About Spherically Distributed Regression Errors

This post is based on a handout that I use for one of my courses, and it relates to the usual linear regression model,

y = Xβ + ε

In our list of standard assumptions about the error term in this linear multiple regression model, we include one that incorporates both homoskedasticity and the absence of autocorrelation. That is, the individual values of the errors are assumed to be generated by a random process whose variance (σ²) is constant, and all possible distinct pairs of these values are uncorrelated. This implies that the full error vector, ε, has a scalar covariance matrix, σ²I_n.

We refer to this overall situation as one in which the values of the error term follow a “Spherical Distribution”. Let's take a look at the origin of this terminology.

The following discussion is quite general, so you'll realize that it applies to any random variables, not just the error term in our regression model. Further, so that we can look at some diagrams, let’s consider the special case of two dimensions, rather than three, so that what would be a (3-dimensional) sphere becomes a (2-dimensional) circle.

So, consider the pair of random values ε_i and ε_j, which we’ll generically denote x and y. (This latter terminology has nothing to do with X and y in the regression model.) The values of these two random variables are plotted in the directions of the x and y axes in the graphs which follow.

In the three-dimensional plots we will see the joint probability density function, p(x, y) in the direction of the z axis. All of these three-dimensional plots are for values in the range -3 ≤ x, y ≤ 3. Scales are given on the associated two-dimensional “contour” plots. The latter plots show “isolines” – that is, lines that join up (x, y) points that yield the same value (height) for p(x, y). These contours are exactly analogous to the contour lines that you see on a topographic map to depict the nature of the terrain. They reflect what you see when you look down vertically on to the three-dimensional (bivariate) density plots.

The appearances of the density plots, and the shape of the associated contour plots, depend upon the variances of x and y, and the covariance (and hence correlation) between these two random variables.

If x and y have the same variance (i.e., if ε_i and ε_j are homoskedastic), and if they are uncorrelated (i.e., if ε_i and ε_j are not autocorrelated), then the contours will form circles. If there were three random variables we would need a four-dimensional density graph, and the contours would form a sphere. Hence the term “Spherical Distribution”. If there were four or more random variables the sphere would become a “hyper-sphere”.

If x and y have different variances, the joint density surface is no longer symmetrical in the x and y directions, and then the coutour plot takes the form of an ellipse, rather than a circle. The same thing happens if x and y are correlated, even if they have the same variance. In this case, the slope of the primary axis of the ellipse is determined by the sign of the correlation between x and y.

Some examples follow, all for the case where x and y follow a bivariate normal distribution with zero means for x and y. The plots were all done using R (of course).

E(x , y)' = (0 , 0)' ; var.(x) = var.(y) = 1 ; cov. (x , y) = 0

E(x , y)' = (0 , 0)' ; var.(x) = 1; var.(y) = 9 ; cov. (x , y) = 0

E(x , y)' = (0 , 0)' ; var.(x) = var.(y) = 1 ; cov. (x , y) = 0.5

E(x , y)' = (0 , 0)' ; var.(x) = var.(y) = 1 ; cov. (x , y) = -0.75

E(x , y)' = (0 , 0)' ; var.(x) = var.(y) = 1 ; cov. (x , y) = 0.99

E(x , y)' = (0 , 0)' ; var.(x) = 1 ; var.(y) = 9 ; cov. (x , y) = 0.7

Now, consider a final case:

Do x and y have the same means?

Do they have the same variances?

Are they correlated - if so, positively or negatively?

8 comments:

Pedro H. C. Sant'AnnaMay 3, 2013 at 6:40 AM
This is very much related to the number of factor in large factor models, as considered by Onatski (2009 Econometrics ,2010 ReStat).

Also, in high dimensional data, few contributions have been done to test the sphericity: See this forthcoming paper at Annals of Statistics: http://www.fgv.br/professor/mjmoreira/papers/OMHrevision.pdf

Hence, although a quite classical topic, there is still a lot of things going on!!

=)

Very nice post!
ReplyDelete
Replies
AnonymousMay 3, 2013 at 2:59 PM
great post, I first saw this on R-bloggers.com. Love the 3-d graphs. I am curious if the code will be made available.
ReplyDelete
Replies
AnonymousMay 3, 2013 at 5:51 PM
nice post. i remember when this stuff used to confuse me in the first ever econometrics course i took.
ReplyDelete
Replies
WillMay 7, 2013 at 4:57 AM
Hopefully nobody minds the spoiler now (it's been a few days). My answer is:

E(x , y)' = (3 , 0)'
var.(x) = 0.5
var.(y) = 2
cov. (x , y) = 1/6

ReplyDelete
Replies
UnknownJuly 23, 2018 at 1:44 PM
What are elliptical errors?
ReplyDelete
Replies
Dave GilesJuly 24, 2018 at 3:08 AM
Ones that follow an elliptically symmetric probability distribution. Examples include the multivariate Normal distribution, and the multivariate Student-t distribution. For a formal definition and other examples, see https://en.wikipedia.org/wiki/Elliptical_distribution and https://www.jstor.org/stable/1403038
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Thursday, May 2, 2013

All About Spherically Distributed Regression Errors

8 comments: