Comments on Econometrics Beat: Dave Giles' Blog: What NOT To Do When Data Are Missing

Mistake in the second-to-last line, should be &quo...

2015-10-18T04:36:56.444-07:00

Mistake in the second-to-last line, should be "_increase_ the R-squared (unless y*_n equals the mean of y) and _decrease_ the variance of beta".

I am not sure what the intuition for a potential d...

2015-10-08T12:16:14.983-07:00

I am not sure what the intuition for a potential decrease in R-squared and variance of beta could be. But here is my argument for a decrease. The imputed y*_n will lie on the regression line. The associated epsilon*_n will be zero. The estimated sigma^2 will be lower (an extra zero term in the sum, but an extra unit in the denominator), which will decrease both the R-squared (unless y*_n equals the mean of y) and the variance of beta. Could you provide (an idea for) an example where the effect would be the opposite?

Yes that's right, and any number of X variable...

2015-10-07T08:29:19.264-07:00

Yes that's right, and any number of X variables. Generally, I'm in favour of imputation, as long as it is done well.

Thanks for another great post! I'm assuming th...

2015-10-07T04:55:27.663-07:00

Thanks for another great post! I'm assuming this result holds for any number of missing Y values provided the data on X are not missing, and of course we don't have n<k?

Also, I was curious of your general feelings about imputing missing data? Are you ever in favor of it? Or do you think what's missing should always be left missing?

Thanks!

I had always thought the temptation with missing d...

2015-10-05T12:01:19.552-07:00

I had always thought the temptation with missing data was to impute missing values of independent variables, so that one does not have to drop a large number of rows. My understanding is that there are a number of open questions in econometrics as to how best to do this, e.g. for principal components analysis of large datasets.

Not sure that I can. Maybe another reader can help...

2015-10-03T08:37:03.536-07:00

Not sure that I can. Maybe another reader can help?

Can you put this on the context of multiple imputa...

2015-10-01T23:14:46.614-07:00

Can you put this on the context of multiple imputation, which accounts for the uncertainty in y*_n? There's a lot of literature arguing for it, but the intuition has always escaped me.

You have to be careful, yes. The standard errors m...

2015-10-01T12:05:57.719-07:00

You have to be careful, yes. The standard errors may be greater or smaller, and the same is true of the R-squared. You can easily check this with an empirical example.

In addition, one has to be careful interpreting th...

2015-10-01T09:40:27.926-07:00

In addition, one has to be careful interpreting the variance of the estimated regression coefficient(s) and the R-squared if imputation was done before running the estimation. In your example I suspect the R-squared would be spuriously higher and the variance of beta would be lower than it should be.