Sunday, October 27, 2019

Reporting an R-Squared Measure for Count Data Models

This post was prompted by an email query that I received some time ago from a reader of this blog. I thought that a more "expansive" response might be of interest to other readers............

In spite of its many limitations, it's standard practice to include the value of the coefficient of determination (R2) - or its "adjusted" counterpart - when reporting the results of a least squares regression. Personally, I think that R2 is one of the least important statistics to include in our results, but we all do it. (See this previous post.)

If the regression model in question is linear (in the parameters) and includes an intercept, and if the parameters are estimated by Ordinary Least Squares (OLS), then R2 has a number of well-known properties. These include:
  1. 0 ≤ R2 ≤ 1.
  2. The value of R2 cannot decrease if we add regressors to the model.
  3. The value of R2 is the same, whether we define this measure as the ratio of the "explained sum of squares" to the "total sum of squares" (RE2); or as one minus the ratio of the "residual sum of squares" to the "total sum of squares" (RR2).
  4. There is a correspondence between R2 and a significance test on all slope parameters; and there is a correspondence between changes in (the adjusted) R2 as regressors are added, and significance tests on the added regressors' coefficients.   (See here and here.)
  5. R2 has an interpretation in terms of information content of the data.  
  6. R2 is the square of the (Pearson) correlation (RC2) between actual and "fitted" values of the model's dependent variable. 
However, as soon as we're dealing with a model that excludes an intercept or is non-linear in the parameters, or we use an estimator other than OLS, none of the above properties are guaranteed.