Monday, November 10, 2014

Reverse Regression Follow-up

At the end of my recent post on Reverse Regression, I posed three simple questions - homework for the students among you, if you will. 

Here they are again, with brief "solutions":

First recall the context. We fitted the following simple regression model, using OLS:

            yi = βxi + εi   .                                                              (1)

All of the data are calculated as deviations from their respective sample means.

The OLS estimator of β is, 

            b = Σ(xiyi) / Σ(xi2) ,

where the summations are for i = 1 to n (the sample size).

Then we estimated the "reverse regression":

           xi = αyi + ui   ,                                                              (2)

and the OLS estimator of α is,

           a = Σ(xiyi) / Σ(yi2).

We showed that a ≤ (1 / b), regardless of the values of the data in the sample.

The questions I posed, with their answers, are as follows:

1.  Under what circumstances will (4) hold as an equality?
If each value of xi is proportional to the corresponding yi value, with the same proportionality constant, for all i, then a = b.
2.  What can you say about the relationship between the two R2 values that we get when we estimate (1) and (2) by OLS?
They must be identical to each other!
For (1), the sum of the squared residuals is:     
Σ(yi - bxi)= Σ(yi2) + b2Σ(xi2) - 2bΣ(xiyi         
                 = Σ(yi2) - [Σ(xiy)]2 / Σ(xi2)
                          = [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / Σ(xi2) .                      (3)
So, the corresponding R2 is:
      Rb2  = 1 - [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / [Σ(xi2) Σ(yi2)].
Note that this expression is totally symmetric in the xi's and yi's. So, obviously, the R2 associated with equation (2), say Ra2, equals Rb2.
Moreover, we can see from the expression for Rb2 that it is just the squared (Pearson) sample correlation between the x's and the y's. So, of course it's the same in each case.
3.  What can you say about the relationship between the t-ratios for testing H0: β = 0 in (1); and for testing H0': α = 0 in (2)?

Once again, they must also be identical to each other! 

From equation (3) the unbiased estimator of the error variance in equation (1) is:

sb2 = (1 / (n - 1)) [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / Σ(xi2) ,
and so the "standard error" associated with the OLS estimator, b, is:
s.e.(b) = {(1 / (n - 1)) [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / [Σ(xi2)]2}½ ,
and the t-ratio for testing H0 : β = 0 is:
tβ = [Σ(xiyi) / Σ(xi2)] / s.e.(b)    = (n - 1)Σ(xiyi) / [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] .
Again, this last expression is symmetric in the xi and yi, and so the t-ratio for testing H0': α = 0, say tα, is equal to the expression for tβ
For any non-believers, here's a little illustration using EViews with some artificial data. The latter are available on the data page for this blog. You can replicate the results with any software of your choice.

Equation (1):

Equation (2):

From equation (2), a = 2.063878 < (1 / b) = (1 / 0.412) = 2.4272. The R2 values are the same, as are the two slope coefficient t-ratios.

You'll also notice that the sample means of both X and Y are non-zero, but I retained an intercept in the models. As I noted at the beginning of my previous post, this is equivalent to going through the analysis assuming that all of the data have been expressed as deviations about their respective sample means.

© 2014, David E. Giles


  1. Brilliant Professor thank you.

    Would r-squared thus remaombthe same for the reversed regression if a constant is included? (y= a + bx + e, vs x= b x ax + e).


    1. Yes, absolutely - you can see this illustrated in equations (1) and (2), in fact. And it will always hold for any X and Y because the R-squared in such cases is just the square of the sample correlation between X and Y.

  2. Dear Professor,

    Would the same results for matching r-squared hold if a constant (intercept) is included?
    And thus adjusted r-squared also unchanged since this just uses the r-squared but with the degrees freedom adjusted for the intercept included?

    Thank you, wonderfully useful blog!

    1. "Yes" to the second part of your second question too. :-)

  3. Dear Prof Giles,

    I was wondering, what about if we wanted to test the joint statistical significance of the coefficients alpha and beta?
    i.e. Ho: a=b=0.

    Thank you.

    1. You don't need to. If you can't reject the hypothesis that alpha=0, then you conclude there is no linear relationship between x and y; hence there is no linear relationship between y and x; hence you cannot reject the hypothesis that beta=0. (And vice versa.) And this is precisely why the 2 t-ratios are identical.

  4. This comment has been removed by the author.


Note: Only a member of this blog may post a comment.