Sunday, October 27, 2019

Reporting an R-Squared Measure for Count Data Models

This post was prompted by an email query that I received some time ago from a reader of this blog. I thought that a more "expansive" response might be of interest to other readers............

In spite of its many limitations, it's standard practice to include the value of the coefficient of determination (R2) - or its "adjusted" counterpart - when reporting the results of a least squares regression. Personally, I think that R2 is one of the least important statistics to include in our results, but we all do it. (See this previous post.)

If the regression model in question is linear (in the parameters) and includes an intercept, and if the parameters are estimated by Ordinary Least Squares (OLS), then R2 has a number of well-known properties. These include:
  1. 0 ≤ R2 ≤ 1.
  2. The value of R2 cannot decrease if we add regressors to the model.
  3. The value of R2 is the same, whether we define this measure as the ratio of the "explained sum of squares" to the "total sum of squares" (RE2); or as one minus the ratio of the "residual sum of squares" to the "total sum of squares" (RR2).
  4. There is a correspondence between R2 and a significance test on all slope parameters; and there is a correspondence between changes in (the adjusted) R2 as regressors are added, and significance tests on the added regressors' coefficients.   (See here and here.)
  5. R2 has an interpretation in terms of information content of the data.  
  6. R2 is the square of the (Pearson) correlation (RC2) between actual and "fitted" values of the model's dependent variable. 
However, as soon as we're dealing with a model that excludes an intercept or is non-linear in the parameters, or we use an estimator other than OLS, none of the above properties are guaranteed.

For example, when reporting a linear model that's been estimated by Instrumental Variables, we get different R2 values depending on which of the two  definitions noted in property 3 above is adopted. Similarly, when estimating Logit and Probit models (for instance), most econometrics packages report several "pseudo-R2" statistics, because there's no single measure that has all of the desirable features that we're used to in the linear model/OLS case.

So-called "count" data arise frequently in empirical economics. These are data that take values that are only non-negative integers, namely 0, 1, 2, 3, 4, ........ Models for such data are often based on the Poisson or negative binomial distributions, although other distributions may also be used. Regressors enter the model by equating the mean of the chosen distribution to a positive function of these variables and their coefficients.

For instance, if the yi data (i = 1, 2, ...., n) are being modelled using a Poisson distribution with a mean of μ, then we typically assign μi = exp[xi'β], using familiar regression notation. The resulting non-linear model is then estimated by MLE (or quasi-MLE).

What's a sensible way of reporting an R2 measure for an estimated Poisson regression?

As with the Logit-Probit case noted above, several possibilities suggest themselves. However, unlike that other case, when modelling "count" data there is actually one definition of R2 that really stands out as the obvious choice.

What is it?

Before answering this question, let's look at how RR2, RE2, and RC2 behave when applied in the context of Poisson, or negative binomial, regression. Some key facts include:
  • The three measures will generally differ in value from one another.
  • We still have 0 ≤ RC2 ≤ 1. However, although RR2 ≤ 1 it can be negative (even if an intercept is included in the model); and although RE2 ≥ 0 it can be greater than one (even with an intercept).
  • All three measures can decrease as regressors are added to the model. 
When we compare these results with the six properties noted above for the OLS case, they suggest that these R2 measures are probably best avoided with count data models. Interestingly, it's RR2 that's reported as a matter of course by the EViews package. Stata, on the other hand, reports McFadden's "pseudo-R2" for these models, but its properties are no better.

Cameron and Windmeijer (1996) effectively answer the question that I posed above.

They consider various R2-type measures for count data models. These measures differ primarily on the type of residuals (from the estimated model) that are used in their construction. As in the case of a linear regression, the usual, or "raw", residuals are the differences between the actual yi values and their "predicted" mean values. That is, they're of the of the form (yi - μi*), where μi* = exp[xi'β*], and β* is the MLE of the β vector. These residuals give us RR2, noted above.

In regression analysis in general, there are actually lots of different forms of residuals that can be constructed, and these can be useful in various situations - especially with generalized linear models (of which the Poisson count models is an example). Some examples include the Pearson (standardized) residuals and the so-called "deviance" residuals. (for more on the notion of "deviance" and goodness-of-fit, see this post.)

Cameron and Windmeijer (1996) consider the properties of R2 measures for Poisson and negative binomial models based on both of these other types of residuals, as well as on the "raw" residuals. (Cameron and Windmeijer (1997) extend these results to a variety of other non-linear models.)

They make a convincing case for constructing an R2 measure using the deviance residuals, when working with a Poisson regression model or the negative binomial (NegBin2) model.

(As an aside, when the model is linear and we use OLS, the deviance residuals are just the usual residuals.)

For the Poisson model, the ith. deviance residual is defined as

di = sign(yi - μi*)[2{yilog(yi / μi*) - (yi - μi*)}]½       ;     i = 1, 2, ...., n

and the deviance R2 for that model is defined as:

RD,P2 = 1 - Σ{yilog(yi / μi*) - (yi - μi*)} / Σ{yilog(yi / ybar)},

where here and below all summations are for i = 1, 2, ...., n.

If the model includes an intercept, then this formula simplifies to:

RD,P2 = 1 - Σ{yilog(yi / μi*)} / Σ{yilog(yi / ybar)}.

(Note: if yi = 0, then yilog(yi) = 0. In this case, di = - [2μi*]½.)

Importantly, RD,P2 satisfies the properties 1 to 5 noted earlier.

In the case of the NegBin2 model, the corresponding R2 takes the form:

RD,NB2 = 1 - (A / B) ,

where

A = Σ{yilog(yi / μi*) - (yi + α*-1)log[(yi + α*-1) / (μi* + α*-1)]}

and

B = Σ{yilog(yi / ybar) - (yi + α*-1)log[(yi + α*-1) / (ybar + α*-1)]}.

("ybar" is the sample average of the yi values;  and α* is the MLE of the dispersion parameter for the NegBin2 distribution.)

The RD,NB2 goodness-of-fit measure satisfies properties 1, 3 and 4 noted earlier.

So, when it comes to reporting an R2 for count data models, the usual such measure - based on the "raw" residuals - is generally a very poor choice. Of the other options that are available, the R2 measures constructed using the so-called "deviance residuals" stand out as excellent contenders.


References

Cameron, A. C. & F. A. C. Windmeijer, 1996. R-squared measures for count data regression models with applications to health-care utilization. Journal of Business and Economic Statistics, 14, 209-220. (Download working paper version.)

Cameron, A. C. & F. A. C. Windmeijer, 1997. An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Economerics, 77, 329-342.

© 2019, David E. Giles

2 comments:

  1. Thanks for the informative post Professor.
    I have a question about the allocation of R² among regressors. some say that there is little point in doing so, but others say the opposite and developed algorithms for this purpose.
    What do you think about this topic Professor?

    ReplyDelete
    Replies
    1. R-squared is a positive monotonic function of the usual F-statistic for testing that all of the slope coefficients are zero. In the same way, you can "allocate" R-squared so that it's parts are expressed in terms of similar functions of the F-statistics for testing that different sub-sets of the coefficients are jointly zero. So, there's really nothing to be added by such an "allocation". Just test the sub-set hypotheses that are of economic interest.

      Delete

Note: Only a member of this blog may post a comment.