Monday, August 4, 2014

Estimation & Accuracy After Model Selection

This was the title of Brad Efron's invited paper at the 2014 Joint Statistical Meetings in Boston this morning. It was a great presentation, with excellent discussants - Lan Wang, Lawrence Brown, and Soumendra Lahiri.

The paper and discussion are scheduled to appear in the September 2014 issue of JASA.

A lot of what Brad and his discussants had to say related to one of the main points in one of my recent posts. Namely, if you search for a model specification, then this affects all of your subsequent inferences - and usually in a rather complicated way. Typically, even after searching for a preferred model, we tend to "pretend" that we haven't done this, and that the model's form was known from the outset. Naughty! Naughty!

What Brad has done is to address the "pre-test" issue in a rather nice way. You won't be surprised to learn that bootstrapping features heavily in the methodology that he's developed. Using two examples - one non-parametric, and one parametric - he showed how to take account of model selection via Mallows' Cp statistic, and the lasso (respectively), when constructing regression confidence intervals.

One important feature of his analysis involves "smoothing" the results to take account of the discontinuities that inherent in model selection. Although it wasn't mentioned in Brad's talk, these discontinuities are the source of some of the most important problems associated with pre-testing in general. For example, traditional pre-test estimators of regression coefficients (based, say, on a prior test of linear restrictions on those coefficients) are inadmissible under a range of standard loss functions. This inadmissibility is entirely due to the fact that these pre-test estimators are discontinuous functions of the random sample data.

All in all it was a great session, with some nice take-away quotes:

  • "The discussants actually discussed my paper."
  • "Simulations are hard to do."
  • "Model averaging is perfectly easy to do, but model selection is not."

I took some comfort from the last two of these comments!


    © 2014, David E. Giles

    1 comment:

    1. A 2013 version of the paper can be found here:
      http://statweb.stanford.edu/~ckirby/brad/papers/2013ModelSelection.pdf

      ReplyDelete