With all of this emphasis on "Big Data", I was pleased to see this post on the Big Data Econometrics blog, today.
When you have a sample that runs to the thousands (billions?), the conventional significance levels of 10%, 5%, 1% are completely inappropriate. You need to be thinking in terms of tiny significance levels.
I discussed this in some detail back in April of 2011, in a post titled, "Drawing Inferences From Very Large Data-Sets". If you're of those (many) applied researchers who uses large cross-sections of data, and then sprinkles the results tables with asterisks to signal "significance" at the 5%, 10% levels, etc., then I urge you read that earlier post.
It's sad to encounter so many papers and seminar presentations in which the results, in reality, are totally insignificant!
Someone - probably a Bayesian - once referred to these as 'tests for sample size'.ReplyDelete
Not mentioned in this post or either of the linked posts (I think!) is the Oxford Bulletin paper by David Hendry, Julia Campos and Hans-Martin Krolzig: http://ideas.repec.org/a/bla/obuest/v65y2003is1p803-819.htmlReplyDelete
They suggest that T^(-0.8) should be used to determine the significance level with larger sample sizes.
So does Big Data mean that we should go back to talking about large scale structural models which yield multiple testable hypotheses but test those hypotheses jointly rather than individually?ReplyDelete
Deidre McClosekey knew it all along ;)ReplyDelete