Econometrics Beat: Dave Giles' Blog: P-Values, Statistical Significance, and Logistic Regression

Thursday, September 19, 2013

P-Values, Statistical Significance, and Logistic Regression

Yesterday, William M. Briggs ("Statistician to the Stars") posted on his blog a piece titled "How to Mislead With P-values: Logistic Regression Example".

Here are some extracts which, hopefully, will encourage to read the post:

"It’s too easy to generate “significant” answers which are anything but significant. Here’s yet more—how much do you need!—proof. The pictures below show how easy it is to falsely generate “significance” by the simple trick of adding “independent” or “control variables” to logistic regression models, something which everybody does...............

Logistic regression is a common method to identify whether exposure is “statistically significant”. .... (The) Idea is simple enough: data showing whether people have the malady or not and whether they were exposed or not is fed into the model. If the parameter associated with exposure has a wee p-value, then exposure is believed to be trouble.

So, given our assumption that the probability of having the malady is identical in both groups, a logistic regression fed data consonant with our assumption shouldn’t show wee p-values. And the model won’t, most of the time. But it can be fooled into doing so, and easily. Here’s how.

Not just exposed/not-exposed data is input to these models, but “controls” are, too; sometimes called “independent” or “control variables.” These are things which might affect the chance of developing the malady. Age, sex, weight or BMI, smoking status, prior medical history, education, and on and on. Indeed models which don’t use controls aren’t considered terribly scientific.

Let’s control for things in our model, using the same data consonant with probabilities (of having the malady) the same in both groups. The model should show the same non-statistically significant p-value for the exposure parameter, right? Well, it won’t. The p-value for exposure will on average become wee-er (yes, wee-er). Add in a second control and the exposure p-value becomes wee-er still. Keep going and eventually you have a “statistically significant” model which “proves” exposure’s evil effects. Nice, right?"

Oh yes - don't forget to read the responses/comments for this post, here.

Econometrics Beat: Dave Giles' Blog

Pages

Thursday, September 19, 2013

P-Values, Statistical Significance, and Logistic Regression

No comments:

Post a Comment