Monday, November 10, 2014

Reverse Regression Follow-up

At the end of my recent post on Reverse Regression, I posed three simple questions - homework for the students among you, if you will. 

Here they are again, with brief "solutions":

First recall the context. We fitted the following simple regression model, using OLS:

            yi = βxi + εi   .                                                              (1)

All of the data are calculated as deviations from their respective sample means.

The OLS estimator of β is, 

            b = Σ(xiyi) / Σ(xi2) ,

where the summations are for i = 1 to n (the sample size).

Then we estimated the "reverse regression":

           xi = αyi + ui   ,                                                              (2)

and the OLS estimator of α is,

           a = Σ(xiyi) / Σ(yi2).

We showed that a ≤ (1 / b), regardless of the values of the data in the sample.

The questions I posed, with their answers, are as follows:

1.  Under what circumstances will (4) hold as an equality?
If each value of xi is proportional to the corresponding yi value, with the same proportionality constant, for all i, the a = b.
2.  What can you say about the relationship between the two R2 values that we get when we estimate (1) and (2) by OLS?
They must be identical to each other!
For (1), the sum of the squared residuals is:     
Σ(yi - bxi)= Σ(yi2) + b2Σ(xi2) - 2bΣ(xiyi         
                 = Σ(yi2) - [Σ(xiy)]2 / Σ(xi2)
                          = [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / Σ(xi2) .                      (3)
So, the corresponding R2 is:
      Rb2  = 1 - [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / [Σ(xi2) Σ(yi2)].
Note that this expression is totally symmetric in the xi's and yi's. So, obviously, the R2 associated with equation (2), say Ra2, equals Rb2.
Moreover, we can see from the expression for Rb2 that it is just the squared (Pearson) sample correlation between the x's and the y's. So, of course it's the same in each case.
3.  What can you say about the relationship between the t-ratios for testing H0: β = 0 in (1); and for testing H0': α = 0 in (2)?

Once again, they must also be identical to each other! 

From equation (3) the unbiased estimator of the error variance in equation (1) is:

sb2 = (1 / (n - 1)) [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / Σ(xi2) ,
and so the "standard error" associated with the OLS estimator, b, is:
s.e.(b) = {(1 / (n - 1)) [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] / [Σ(xi2)]2}½ ,
and the t-ratio for testing H0 : β = 0 is:
tβ = [Σ(xiyi) / Σ(xi2)] / s.e.(b)    = (n - 1)Σ(xiyi) / [Σ(xi2) Σ(yi2) - [Σ(xiyi)]2] .
Again, this last expression is symmetric in the xi and yi, and so the t-ratio for testing H0': α = 0, say tα, is equal to the expression for tβ
For any non-believers, here's a little illustration using EViews with some artificial data. The latter are available on the data page for this blog. You can replicate the results with any software of your choice.

Equation (1):

Equation (2):

From equation (2), a = 2.063878 < (1 / b) = (1 / 0.412) = 2.4272. The R2 values are the same, as are the two slope coefficient t-ratios.

You'll also notice that the sample means of both X and Y are non-zero, but I retained an intercept in the models. As I noted at the beginning of my previous post, this is equivalent to going through the analysis assuming that all of the data have been expressed as deviations about their respective sample means.

© 2014, David E. Giles

No comments:

Post a Comment