Econometrics Beat: Dave Giles' Blog: Interpreting Confidence Intervals

Wednesday, May 14, 2014

Interpreting Confidence Intervals

I enjoyed William M. Briggs' ("Statistician to the Stars") post today: "Frequentists are Closet Bayesians: Confidence Interval Edition". Getting your head around the (correct) interpretation of a confidence interval can be difficult for students. Try teaching it - and keeping a straight face! It's a challenge, to be sure. On such days, my Bayesian inclinations percolate to the top.

That being said, I thought it would be worth pointing newcomers to this blog to a post of mine from 2011 that relates closely to what Matt Briggs has to say. The post tells a tale from the early years at the USDA Graduate School, where Jerzy Neyman presented some guest lectures/seminars in the 1930's. You'll find it all here.

Students may take some comfort from the interchange that is reported in the latter part of the post.

Update: In the post I refer to a paper by my colleague, Malcolm Rutherford: "The USDA Graduate School: Government Training in Statistics and Economics, 1921-1945". It's now published in the December 2011 issue of the Journal of the History of Economic Thought (vol. 33, no. 4, pp. 419-447). If you don't have online access to this journal, the Working Paper version of the paper is available here.

13 comments:

AnonymousMay 15, 2014 at 7:38 AM
Dave:

You said in your 2011 post: “Because an estimator is a function of the random data, it's random itself.”

Let’s see. Take the case of the common sample mean. The formula, or the procedure, or estimator, is “sum the x’s and divide by the sample size”. With a set of x’s and sample size n one obtains the sample mean, the estimate. Take another set of x’s, repeat the procedure, and one obtains a new estimate of the sample mean. Repeat many times and one obtains the sampling distribution of the sample mean.

However, the procedure, “sum the x’s and divide by the sample size”, remains invariant from sample to sample. Nothing random about it. And that means that the procedure or estimator (X 'X)'X 'y remains invariant, but that the estimates, the b’s vary from sample to sample because the y’s vary from sample to sample. So the b’s are random variables, but the procedure, the estimator, is not.

Your thoughts?
ReplyDelete
Replies
AnonymousMay 18, 2014 at 2:38 PM
Dave, I neglected to mention that I indeed did that. I have a good collection of stats, math stat and econometric texts from my UG and graduate stats courses. What I find, no surprise, is a fair amount of variation in the treatment (dare I say random treatment?) in the definition, although always making clear the distinction between estimator and estimate. That is not the issue I raise, although you seem to think so.

A venerable text still in print is Hoel’s Introduction to Mathematical Statistics. Quoting from the 3rd edition, p. 58: “It is customary for some statisticians to use the word estimator for the function and the word estimate for the value of the function after the observational values have been inserted. Thus [sum of x’s over n] would be called the estimator of theta. Other statisticians, however, use the word estimate both for the function and its numerical value.”

While you use the first definition that Hoel gives, as do I, my focus has been on the latter part: “. . . the word estimate for the value of the function after the observational values have been inserted”.

Note that it is the observational values that are random, not the function, which is my point.

Of all the texts that I consulted, 10 in total, Henri Theil in his Principles of Econometrics (1971) seems to have the clearest definition presented in a numerical example on p. 87:

“Note that the numerical value 9.84 is called an estimate of mu, where as Xbar [the function of the random variable X1, . . . . Xn before the sample is specified numerically] is called an estimator of mu.”

Theil’s and Hoel’s definitions are nearly the same. My point all along is that you term the function as random, whereas I suggest that the function does not vary – its arguments or observational values do, and they are random. I am making a point about your description of the mathematics, not the statistics.
ReplyDelete
Replies
Mark SchafferMay 21, 2014 at 12:55 PM
Dave,

I'd like to have a go at "what you can/can't say" about a CI and maybe you and/or Anonymous can comment.

Anonymous's distinction above between an "estimator" and an "estimate" is very helpful. Let's apply this to a confidence interval.

Notation (and apologies for the absence of subscripts):

D is a dataset in general, "before the sample is specified numerically" (Theil).

[ θL(D), θU(D) ] is an estimator: it is a CI for the parameter θ in general.

D0 is my actual dataset at hand.

[ θL(D0), θU(D0) ] is an estimate: it is the CI I calculate "after the observational values have been inserted" (Hoel).

Or, to use a culinary metaphor, D = ingredient list, [ θL(D), θU(D) ] = recipe, D0 = the actual ingredients I've picked up at the market, [ θL(D0), θU(D0) ] = the dish I've prepared.

Statement #1 (CI as "estimate"):

"I have a dataset D0, and with it I have calculated a 95% confidence interval [ θL(D0), θU(D0) ]. This means there is a 95% probability that θ lies in the interval."

Statement #2 (CI as "estimator"):

"I have a dataset D0, and with it I am planning to calculate a 95% confidence interval [ θL(D), θU(D) ]. This means there is a 95% probability that θ will lie in this interval."

Statement #1 is wrong but Statement #2 is fine.

Or, to continue with the culinary metaphor, the "confidence" relates to the recipe (how likely it is to generate a tasty meal), not the actual dish I've prepared. Either it's tasty or it isn't!

I hope I got that right. Would be interested to see what you think.

--Mark
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Wednesday, May 14, 2014

Interpreting Confidence Intervals

13 comments: