Econometrics Beat: Dave Giles' Blog: Orthogonal Regression: First Steps

Sunday, November 16, 2014

Orthogonal Regression: First Steps

When I'm introducing students in my introductory economic statistics course to the simple linear regression model, I like to point out to them that fitting the regression line so as to minimize the sum of squared residuals, in the vertical direction, is just one possibility.

They see, easily enough, that squaring the residuals deals with the positive and negative signs, and that this prevents obtaining a "visually silly" fit through the data. Mentioning that one could achieve this by working with the absolute values of the residuals provides the opportunity to mention robustness to outliers, and to link the discussion back to something they know already - the difference between the behaviours of the sample mean and the sample median, in this respect.

We also discuss the fact that measuring the residuals in the vertical ("y") direction is intuitively sensible, because the model is purporting to "explain" the y variable. Any explanatory failure should presumably be measured in this direction. However, I also note that there are other options - such as measuring the residuals in the horizontal ("x") direction.

Perhaps more importantly, I also mention "orthogonal residuals". I mention them. I don't go into any details. Frankly, there isn't time; and in any case this is usually the students' first exposure to regression analysis and they have enough to be dealing with. However, I've thought that we really should provide students with an introduction to orthogonal regression - just in the simple regression situation - once they've got basic least squares under their belts.

The reason is that orthogonal regression comes up later on in econometrics in more complex forms, at least for some of these students; but typically they haven't seen the basics. Indeed, orthogonal regression is widely used (and misused - Carroll and Ruppert, 1966) to deal with certain errors-in-variables problems. For example, see Madansky (1959).

That got me thinking. Maybe what follows is a step towards filling this gap.

Let's focus on a simple regression model,

y_i = β₀ + β₁x_i + ε_i ; ε_i ~ i.i.d. N [0, σ²] . (1)

Let s_xx, s_yy, and s_xy be the sample variance of x, the sample variance of y, and the sample covariance of x and y, respectively. Specifically, if n is the sample size, and x* is the sample average of the x_i's, then

s_xx = [Σ(x_i - x*)²] / (n - 1), etc.

We all know that the OLS estimator of β₁ is, b₁ = (s_xy / s_xx), and the associated estimator of β₀is, b₀ = y* - b₁x* . These are also the maximum likelihood estimators of the regression coefficients if x is non-random, given the normality of the errors. So, they are "best unbiased", and also consistent, and asymptotically efficient estimators.

Now, recall the that shortest distance between a point and a straight line is obtained if we measure orthogonally (at right angles) to the line. So, let's think about measuring our regression residuals in this way:

(Just click on any of the images to enlarge them.)

In the diagram above, the red line is the fitted regression line; X is just one typical observed data-point; and the line XB is orthogonal to the red line. The i^th orthogonal residual is of length d_i.

If we're fitting the line using OLS, then the residuals that we use are vertical residuals, such as e_i in the diagram. However, if we're going to follow up on the idea of fitting the regression line so as to minimize the sum of the squared orthogonal residuals, then the first thing that we need to do is to figure out the expression for the length of an orthogonal residual, d_i.

We can do this by using a little trigonometry. Of course, most students will deny having ever learned any trigonometry, but they're ~~lying~~ exaggerating. They just can't remember back to Grade 5 - or whenever. So, you just have to prod them a little - figuratively speaking, of course. Let's look at a second diagram:

In this case, a typical observed data-point is at D. Because BD is orthogonal to the red line, the angle BAD = the angle BDC = θ, say.

From the triangle, BAD, we see that sin(θ) = (BD / AD) = d_i / [x_i - (y_i - b^o₀) / b^o₁]

From the triangle, BCD, we see that cos(θ) = (BD / CD) = d_i / (b^o₀ + b^o₁x_i - y_i)

Because cos²(θ) + sin²(θ) = 1, it follows that

(d_i²) / (b^o₀ + b^o₁x_i - y_i)² + (b^o₁²d_i²) / (b^o₁x_i - y_i + b^o₀)² = 1 .

So,
d_i² (1 + b^o1²) = (y_i - b^o₀ - b^o₁x_i)²,

or,
d_i = (y_i - b^o₀ - b^o₁x_i) / (1 + b^o₁²)^½ .

So, to fit the orthogonal regression we need to find the values of b^o₀ and b^o₁ that will minimize the function,

S = Σ[ y_i - b^o₀ - b^o₁x_i]² / [1 + b^o₁²] ,

where the summation runs from i = 1 to n.

Differentiating S partially with respect to b^o₀ and b^o₁ and setting these derivatives equal to zero, we obtain the solutions:

b^o₁ = [s_yy - s_xx + ((s_xx - s_yy)² + 4s_xy²)^½] / [2s_xy] , (2)

and
b^o₀ = y* - b^o₁x* . (3)

(You can easily check that these values locate a minimum of S, by evaluating the (2 x 2) Hessian matrix.)

You can see, from (3), that the regression line, fitted using orthogonal least squares, passes through the sample mean of the data (even though the point (x* , y*) is not likely to be in the sample). This is also a property of the OLS regression line, of course.

Also, the (vertical direction) residuals based on the orthogonal regression estimator sum to zero - as long as we have the intercept in the model. Again, this coincides with the situation with OLS. This is something that you can verify very quickly.

Let's look at a couple of actual examples of orthogonal regression. First, I've generated some artificial data, using (1) with β₀ = 1 ; β₁ = 2 ; σ = 1; and n = 10,000. Then I've applied both OLS and orthogonal least squares:

Actually, the last above graph was created in EViews simply by "grouping" the x and y series; creating a scatter-plot; and then choosing the option to "add" both the orthogonal least squares and ordinary least squares regression lines. In fact the OLS estimates are b₀ = 0.9916, and b₁ = 2.0107. It's easy to apply formulae (2) and (3) to find the values of b^o₀ and b^o₁ .

Here's a second example, this one using actual South African household expenditure data made available by Adonis Yatchew (U. of Toronto), at the bottom of his web page. I've fitted a really basic Engel curve for food, of the Working-Leser form:

(e^f_i / E_i) = Ln(E_i) + u_i ; i = 1, 2, ...., n

where e^f is expenditure on food; E is total expenditure; and n = 7,358. Here are the results:

In this example there's negligible difference between the OLS and orthogonal regression results.

Just out of interest, how do the (sampling) properties of the orthogonal least squares estimators compare with those of the ordinary least squares estimators of β₀ and β₁? The latter estimators are best linear unbiased (by the Gauss-Markov Theorem), and with normal errors in (1) they are "best unbiased". They're also weakly consistent.

Looking at the formula for b^o₁ in (2), we can see right away that this estimator non-linear. That is, we can't express it as a linear function of the random, y, data. Accordingly, from (3), b^o₀ is not a linear estimator, either. Both estimators are biased in finite samples. However, they can be shown to be weakly consistent (e.g., see Kendall and Stuart, 1961).

It can also be shown that the orthogonal regression estimators of β₀ and β₁ can be given a maximum likelihood interpretation. Specifically, they are MLEs if both x and y are random, and they follow independent normal distributions with the same variance. (See Carroll and Ruppert, 1996.) However, this is a very special case indeed!

In this post, all that I've discussed is point estimation of the simple linear regression model using orthogonal least squares. There's lots to be said about orthogonal least squares for multiple (possibly non-linear) regression model. That's where the "Total Least Squares" estimator arises. There's also lots to be said about interval estimation and inference. Finally, it will come as no surprise to hear that there's a close connection between orthogonal least squares and principal components analysis.

However, these are matters for future posts.

References

Carroll, R. J. and D. Ruppert, 1996. The use and mis-use of orthogonal regression in linear errors-in-variables models. American Statistician, 50, 1-6.

Fuller, W. A., 1987. Measurement Error Models. Wiley, New York.

Kendall, M. G. and A. Stuart, 1961. The Advanced Theory of Statistics, Vol. 2. Charles Griffin, London.

Madansky, A., 1959. The fitting of straight lines when both lines are subject to error. Journal of the American Statistical Association, 54, 173-205.

10 comments:

DaumantasNovember 17, 2014 at 11:23 AM
Simply sweet! Great post!

Could you also suggest a simple real world example where you would consider using orthogonal regression instead of OLS?
ReplyDelete
Replies
UnknownNovember 20, 2014 at 6:48 AM
Very interesting!
One question: what if I run a principal component factor analysis and then I regress the dependent variable on the predicted scores? Is it the same thing?
ReplyDelete
Replies
Dave GilesNovember 20, 2014 at 9:39 PM
Not quite. One connection is that in the simple regression case, where we have 2 variables, X and Y, the fitted orthogonal regression line corresponds to the first principal component.
ReplyDelete
Replies
AnonymousMarch 8, 2015 at 9:49 AM
How to calculate the confidence interval for orthogonal regression?
ReplyDelete
Replies
AnonymousApril 16, 2015 at 12:51 AM
Is it possible to use orthogonal regression to calculate the imputation of missing values
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Sunday, November 16, 2014

Orthogonal Regression: First Steps

10 comments: