Wednesday, September 11, 2013

Can Your Results be Replicated?

It's a "given" that your empirical results should be able to be replicated by others. That's why more and more journals are encouraging or requiring that authors of such papers "deposit" their data and code with the journal as a condition of acceptance for publication.

That's all well and good. However, replicating someone's results using their data and code may not mean very much!

This point was highlighted in a guest post today on the Political Science Replication blog site. The piece, by Mark Bell and Nicholas Miller, is titled "How to Persuade Journals to Accept Your Replication Paper". You should all read it!

Especially if you favour the STATA Stata package!

Here are a few excerpts, to whet your appetites:
"We were easily able to replicate Rauchhaus’ key findings in Stata, but couldn’t get it to work in R. It took us a long while to work out why, but the reason turned out to be an error in Stata: Stata was finding a solution when it shouldn’t have (because of separation in the data). This solution, as we show in the paper, was wrong – and led Rauchhaus’ paper to overestimate the effect of nuclear weapons on conflict by a factor of several million.
It’s very easy when you’re working through someone else’s code to be satisfied when you’ve successfully got your computer to produce the same numbers you see in the paper. But replicating a published result in one software package does not mean that you necessarily understand what that software package is doing, that the software is doing it correctly, or that doing it at all is the appropriate thing to do – in our case none of those were true and working out why was key to writing our paper."
"Learning new methods as you conduct the replication ensures that even if you don’t end up publishing your replication paper, you’ll have learnt new skills that will be valuable down the road."
(The emphasis in red is mine.)

By the way - Brad and Nicholas are both grad. students. More power to them!


© 2013, David E. Giles

13 comments:

  1. I took a look at the paper by the two grad students. I think this is what happened.

    With logit and probit models, ML requires that predicted probabilities for different "cells" (i.e. subsamples identified by dummy variables) match empirical frequencies. So if all observations in the same cell make the same choice (I think this is what they mean by "separation in the data"), then coefficients aren't identified (because linear combinations of them have to shoot off to positive or negative infinity). Apparently, STATA has some way of resolving the lack of identification so that some answer comes out of the package. It's kind of like having a perfect multicollinearity problem in a linear regression, but the package picks a particular solution out for you from the space of solutions to the LS problem--it would be ad hoc and different identification schemes could generate very different parameter estimates, although the predicted value of the dependent variable would be the same in all cases. (FWIW, my quick reading didn't make it clear how the graduate students resolved the lack of identification.)

    My bottom line is that the problem wasn't in the software (I don't know if it allows you to check for this problem and the authors didn't or if it should have printed a warning regardless). The problem was the failure of the original authors to warn the reader that the parameter estimates weren't identified in the data, only the predicted probabilities.

    ReplyDelete
    Replies
    1. Angelo - that's a very good point. I interpreted "separation" in the same way you did - it's a pretty common way of describing that phenomenon. The original authors certainly should have provided that warning.

      More generally, packages that produce any results in this case (or the exact multicollinearity example you cited) still worry me. I'd rather they stopped, and produced an intelligible message that highlights the "problem".

      In addition there remains the point that genuine replication should preferably include alternative software. When you really get down to the computational side of things, it should really include different Operating Systems too, but not for the sort of things we have in mind.

      Delete
    2. Stata does at least tell you about the problem and what it did. Before the table of coefficients it might, for example, say "X == 0 predicts success perfectly" and then tell you that it dropped that variable and some number of observations. Of course, it's up to the user to look at ALL the warning messages and understand what they mean.

      Delete
  2. "More generally, packages that produce any results in this case (or the exact multicollinearity example you cited) still worry me. I'd rather they stopped, and produced an intelligible message that highlights the "problem"."

    The glm function in R for example will not stop and
    instead gives an answer under complete separation.
    Here's an example from an lme4 github issue
    (https://github.com/lme4/lme4/issues/124):

    > set.seed(101)
    > d <- data.frame(y=rbinom(1000,size=1,p=0.5),
    + x=runif(1000),
    + f=factor(rep(1:20,each=50)),
    + x2=rep(0:1,c(999,1)))
    > glm(y ~ x+x2, data=d, family=binomial)

    Call: glm(formula = y ~ x + x2, family = binomial, data = d)

    Coefficients:
    (Intercept) x x2
    -0.10037 0.03549 -12.50117

    Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
    Null Deviance: 1385
    Residual Deviance: 1383 AIC: 1389

    That amazing coefficient of -12.50117 is just a symptom of
    complete separation.

    ReplyDelete
    Replies
    1. Yikes!! I don't like that either!!!

      Delete
    2. Hi Dave,

      In the same vein as the previous poster: I don't think there is any need to blame a particular software here (BTW note it's Stata, not STATA). Apparently, the mistake of the original authors from the 2009 study was to use the wrong model for their purposes, but your post seems to insinuate that their mistake was to use the "wrong" software package. For example, using data generated in Stata (see http://www.stata.com/statalist/archive/2013-09/msg00595.html) the -geeglm- function from the -geepack- package (version 1.1-6) in R also provides estimates without any warning messages etc.:

      > #--------------------------------
      > require(Hmisc)
      > require(geepack)
      >
      > dat = stata.get("gee_check1.dta")
      >
      > M1 <- geeglm( y ~ x.1 + x.3, data=dat, id=id,
      + family=binomial(link="logit"),
      + corstr="exchangeable")
      > summary(M1)

      Call:
      geeglm(formula = y ~ x.1 + x.3, family = binomial(link = "logit"),
      data = dat, id = id, corstr = "exchangeable")

      Coefficients:
      Estimate Std.err Wald Pr(>|W|)
      (Intercept) 4.75e-01 1.18e-01 1.63e+01 5.4e-05 ***
      x.1 -1.73e+07 3.80e+04 2.08e+05 < 2e-16 ***
      x.3 -2.05e-01 1.50e-01 1.86e+00 0.17
      ---
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      Estimated Scale Parameters:
      Estimate Std.err
      (Intercept) 0.667 0.0134

      Correlation: Structure = exchangeable Link = identity

      Estimated Correlation Parameters:
      Estimate Std.err
      alpha 0.0187 0.0161
      Number of clusters: 100 Maximum cluster size: 10
      > #--------------------------------

      It is the user's responsibility to chose reasonable models for their data, not the responsibility of a computer program.

      Best,
      Joerg

      Delete
    3. Joerg - thank you for correcting my spelling. I'm not sure where you get the "insinuation" from. I stand by the point that if you are trying to replicate someone's results (including your own!) then it's a darn good idea to run the data on more than one piece of software.
      DG

      Delete
  3. I agree, it is a good idea to replicate other's and one's own work with different packages (or do some other validations, e.g., simulate data that closely resembles some key features of the data at hand and see if the used command/package recovers the parameters correctly, etc.)

    I got the insinuation from the following sentence in your original post:

    "Especially if you favour the Stata package!"

    This sentence in combination with your posted excerpts seem to present the problem as being caused by faulty software when in fact the problem seem to have arisen by researches choosing an inappropriate model.

    Best,
    Joerg

    ReplyDelete
  4. Here's a nice summary and commentary of this from Jeff Pitblado at http://www.stata.com/statalist/archive/2013-09/msg00618.html.

    ReplyDelete
  5. I once had to convince a co-worker that Stata would do this. "But it generated coefficents they must be right." No these and another (infinite) set of coefficients.

    When Stata tells you that the F-Stat is . there is likely something really wrong (also happens when you use a vce with not clusters)

    I don't blame stata. Seems to be more a problem with canned procedures in general.

    ReplyDelete
    Replies
    1. Andrew - fair comment! You need to understand what's in the can, and many users don't even seem to be interested in knowing.

      Delete