Tuesday, December 27, 2016

More on Orthogonal Regression

Some time ago I wrote a post about orthogonal regression. This is where we fit a regression line so that we minimize the sum of the squares of the orthogonal (rather than vertical) distances from the data points to the regression line.

Subsequently, I received the following email comment:
"Thanks for this blog post. I enjoyed reading it. I'm wondering how straightforward you think this would be to extend orthogonal regression to the case of two independent variables? Assume both independent variables are meaningfully measured in the same units."
Well, we don't have to make the latter assumption about units in order to answer this question. And we don't have to limit ourselves to just two regressors. Let's suppose that we have p of them.

In fact, I hint at the answer to the question posed above towards the end of my earlier post, when I say, "Finally, it will come as no surprise to hear that there's a close connection between orthogonal least squares and principal components analysis."

What was I referring to, exactly?
Well, just recall how we define the Principal Components of a multivariate set of data. Suppose that the data are in the form of an (n x p) matrix, X. There are n observations, and p variables. An orthogonal transformation is applied to X. This results in r (≤ p) new variables that are linearly uncorrelated.  These are the principal components (PC's) of the data, and they are ordered as follow. The first PC accounts for the most of the variability in the original data. The second PC accounts for the maximum amount of the remaining variability in the data, subject to the constraint that it is uncorrelated with (i.e., orthogonal to) the first PC. 

Note how orthogonality has crept into the story!

We then continue - the third PC accounts for the maximum amount of the remaining variability in the data, subject to the constraint that it is orthogonal to both the first and second PC's. etc.

You'll find examples of PC analysis being used in a statistically descriptive way in some earlier posts of mine - e.g., here and here.

We can use (some of) the PC's of the regressor data as explanatory variables in a regression model. A useful reference for this can be found here. Note that, by construction, these transformed explanatory variables will have zero multicollinearity.

So, in the multivariate case, orthogonal regression is just least squares regression using a sub-set of the principal components of the original regressor matrix as the explanatory variables. Related to this is the so-called Total Least Squares estimator - that involves taking the principal components of the full data-set, for the dependent variable as well as the regressor matrix.

In this earlier post I talked about using Principal Components Regression (PCR) in the context of simultaneous equations models. The problem there was that we can't construct the 2SLS estimator if the sample size is smaller than the total number of predetermined variables in the entire system. (This used to be referred to as the "under-sized sample" problem.) One solution was to use a few of the principal components of the matrix of data on the predetermined variables, instead of all of the latter variables, at the first stage of 2SLS. (Usually, just the first few principal components will capture almost all of the variability in the original data.)

There are some useful discussions of this that you might want to refer to. For instance, Vincent Zoonekynd has a nice illustration here. I particularly recommend two other pieces that discuss PCR using R - this post, "Principal components regression in R, an operational tutorial", by John Mount, on the Revolutions blog; and this post, "Performing principal components regression (PCR) in R", by Michy Alice, on the Quantide site.

PCR also gets a brief mention in this earlier post of mine - see the discussion of the last paper mentioned in that post.

So, the bottom line is that while my introductory post dealt with just the single-regressor case, it's straightforward to apply orthogonal multiple regression - it's just regression using the first few principal components of the  regressor matrix.

© 2016, David E. Giles


  1. Thank you so much for this blog post. I really enjoyed reading it and found the topic so useful in our work: we use orthogonal multiple regression for tax revenue forecasting in the Institute of Fiscal Studies in Spain. Here we have and example of how we use macroeconomic partial indicators and Principal Component Analysis to obtain orthogonal regressors for a transfer function.
    It is the working paper " Combining the predictive ability of factorial analysis and transfer functions for VAT revenue forecasting"
    We use SAS software instead of R software.
    Thank you for taking the time to show all this material and this helpful discussions of this topic.

  2. Perhaps I read the post too quickly, but I cannot quite make the ends meet. Total least squares (TLS) is one thing, principal components regression (PCR) is another. TLS is related to the principal components of all variables (p independent ones and the dependent one). PCR is related to the principal components of the p independent variables alone (_excluding_ the dependent variable). So in TLS the principal components are obtained from a system of p+1 variables while in PCR principal components are obtained from a system of p variables. Thus PCR is not a multivairate extension of TLS, they are two different beasts. (An interesting related method is partial least squares which is in some ways superior to and more intuitive than PCR.)

    1. You're right - what I had written was far from accurate, and I have amended the post accordingly. Thanks for pointing this out.