Wednesday, November 14, 2012

Failing the "Sniff Test"

If it looks like garbage, and smells like garbage, it probably is garbage! Insert any four-letter word of your choice, as long as it begins with "S" or "C", in place of "garbage".

HT to my former colleague, Peter Cribbett, for drawing my attention to this little gem:

"I am a trained economist who read the report. While the motive of the politicians was likely impure, the report is not an example of sound science and should have been rejected by the agency’s quality control. The fatal flaw is that the regressions presented fail to meet any of the assumptions required by the statistical method for it to be valid. In technical terms, the control variables are endogenous Second, the ratio of the number of observations, about 70, to the number of control variables is too small to allow the method to work reliably–you should have a ratio of 20 to 30, he has half that or less. Third, he leaves out several important control variables that are correlated with both tax rates and GDP growth, omissions that bias the results. The paper would have zero chance of being published in any serious economic journal, and would not likely even earn a passing grade in any graduate statistics course."
 This is a comment, by someone who goes by the name of "BB", in response to a post titled, The Death of Facts, on the ideafart blog. (The underlining is mine.)
Well, to start with, the comment is totally in line with the title of the post! That's about as charitable as I can be - sorry!

In a follow-up comment, one called "truthiness" responds:
"Can BB provide any reference to back up and justify his assertion that you need a ratio of “20 to 30″?"
For some reason that I can't quite fathom, "BB" has not replied to this challenge!
In the unlikely event that such a response is ever forthcoming, I'd love to see it!
Just as you shouldn't believe everything that you read in the newspapers, you certainly shouldn't believe everything that you read on the internet. Even if it's asserted by a "trained economist". One thing's for sure - that training wasn't acquired in my classroom!

I'm sorry, but I'm afraid there's no "golden rule" that says that "n" should be no less than 20 to 30 times "k" when we fit a regression model.

© 2012, David E. Giles


  1. From random blog reader:

    1. Thanks for the reference. (It's in the "Journal of Educational and Behavioral Statistics", 2004, for those of you who don't have access to JSTOR.)

      I'm sure that Jason and Robyn are really nice people, but their "study" is way too folksy to be persuasive.

      Well, maybe it persuades me that there's some low-level "science" out there.

  2. This reminds me of the "rule" that my students who had their first stats course in the business school try to tell me: so long as n >= 30, asymptotic approximations will always be very accurate.

    1. Oh yes - I hear that one a lot too .....invariably from the demographic you mentioned!

  3. In the Regression Modelling Strategies, Frank Harrell has a table that gives a similar (though less stringent) prescription:

    I have not followed up on the papers he cites, but I've found the book to be very reasonable in other areas that overlap with topics I know better.

    1. Thanks Dimitriy. Still not impressed by such an arbitrary rule of thumb, though.

    2. It seems that even prominent econometricians are prone to this, for some reason.

      An example that comes to mind is from the '97 Stock & Staiger Weak Instruments paper, which suggests that (in single endogenous regressor case) instruments are weak if the first stage F is under 10. (A table in Stock and Yogo shows just how dangerous that idea is)

  4. "I am a trained economist who read the report." - This killed me.
    Dave, there will always be trolls on the internet, we should not bother!

  5. Yep - but sometimes they need to be taken to task.