Sunday, March 30, 2014

Understanding the Underlying Asumptions

From time to time I've been known to blog about the importance of fully understanding the assumptions that underlie the various estimators and tests that we use in econometrics. (Here, too.) Gee, I've even gone so far as to suggest that students should learn about these assumptions by taking courses where results are proved formally- not introduced simply through arm-waving.

I'm not going to start griping about all of that again here - it's too nice a Spring day for that.

However, I've just been reading a recent piece in Scientific American that's relevant to my main concern when students are taught "how to do" econometrics, but don't have a proper understanding of the underlying assumptions. That concern is simply that, sooner or later, they'll screw up!

Maybe it won't be the end of the world. The economy probably won't collapse in a big messy heap. Perhaps they'll just lose their job!

The S.A. article was about just this sort of thing - but in the case of neuroscientists, not economists. For the sake of full disclosure I have nothing at all against neuroscientists. In fact, I have a daughter who is doing post-grad. work in just that field at the Florey Institute in Australia.

You can read the article for yourself, and I hope that you will. In a nutshell, there have been numerous influential neuroscience studies, that have appeared in the very top scientific journals, and which have been based on fundamentally flawed statistical analysis. 

To put it really simply, the authors have used statistical tests whose validity require that the data have been sampled independently, when in fact this requirement is undeniably violated in these studies.

Oh dear!

To quote Gary Stix, the author of the article:
'Emery N. Brown, a professor of computational neuroscience in the department of brain and cognitive sciences at the MIT-Harvard Division of Health Sciences and Technology, points to a dire need to bolster the level of statistical sophistication brought to bear in neuroscience studies. “There’s a fundamental flaw in the system and the fundamental flaw is basically that neuroscientists don’t know enough statistics to do the right things and there’s not enough statisticians working in neuroscience to help that." ' 
I'd venture to guess that "screwing up" in the neurosciences might have some unpleasant consequences.

Needless to say, I've sent the link to the S.A. article to my daughter!

You might want to think about this the next time you fire up your favourite econometrics package: Did your friendly econometrics instructor make sure that you really understand the assumptions that need to be satisfied before you can rely on the estimators and tests you're about to use?



© 2014, David E. Giles

4 comments:

  1. excellent article Dave.
    We always need to think about the basics.

    Your daughters is in good hands.

    ReplyDelete
  2. The example of dependent observations is excellent! I have actually been looking in the recent past for approaches how to deal with dependent observations, and I found that it was difficult for me to find anything "usable". In that sense, I think it is maybe not only the problem of researchers assuming independence, but the lack of appropriate literature pointing to alternatives (appropriate here meaning: for statistically educated non-statisticians, who do not want to read theoretical papers). I got the impression that standard texts only deal with violations of assumptions if there is an easy fix they can present as well.

    Let me give you recent examples of what I have been looking for: In the linear setting, when people like me think about dependencies in the data, variance adjustments come to mind (cluster-robust variance estimation etc), but are there other issues involved? Trying to find information about this in any standard textbook left me deeply frustrated, to be honest, as I found this topic to be almost completely absent. Other example: how do I estimate a probit model when observations are dependent? Clearly, the likelihood needs to be set up differently, i.e. the joint density can't be written as the product of the marginal densities. But are there other issues involved? Can you find any examples on how to do this? Again, trying to find information about this in any textbook (or the internet) leaves you clueless, and the longer you look for this, the more you see the value of just brushing over it by assuming independence, which everyone else seems to be doing anyway.

    Having said that, I would appreciate any pointers to relevant papers / books / websites / or maybe even old or new blog posts. ;)

    ReplyDelete
  3. One comment about that particular paper: their main point is that if you have some form of panel data, you should adjust for the unit correlation. This seems to make sense and is standard in econometrics I think. But: That paper seems to suggest whenever there is clustering, a multilevel model should be used. While I have never really understood what a multilevel model actually is, I came to believe that in econometric language it is a random effects estimator. (Please correct me here if I am wrong!) I think that at least some people would disagree and argue that cluster-robust standard errors are "sufficient" to account for clustering. If I may ask, what is your take on this, Dave?

    ReplyDelete
    Replies
    1. Adjusting the standard errors to allow for cluster (random) sampling is fine, but there is a more fundamental problem being referred to. There is a specific dependency in the sampling and this needs to be modelled - e.g., by using a hierarchical model. An analogy would be if you were using MLE and you set up the LF as if the data were independent - i.e., you set LF = product of individual data densities - but in fact the data are dependent. You fail to model the dependency so the LF is mis-specified, and the MLE's of the parameters will be inconsistent, etc. Everything goes out of the window. This is the type of situation that is causing concern - not the fact that a "complex survey" has been used to gather the data.

      Delete