## Wednesday, May 14, 2014

### Interpreting Confidence Intervals

I enjoyed William M. Briggs' ("Statistician to the Stars") post today: "Frequentists are Closet Bayesians: Confidence Interval Edition". Getting your head around the (correct) interpretation of a confidence interval can be difficult for students. Try teaching it - and keeping a straight face! It's a challenge, to be sure. On such days, my Bayesian inclinations percolate to the top.

That being said, I thought it would be worth pointing newcomers to this blog to a post of mine from 2011 that relates closely to what Matt Briggs has to say. The post tells a tale from the early years at the USDA Graduate School, where Jerzy Neyman presented some guest lectures/seminars in the 1930's. You'll find it all here.

Students may take some comfort from the interchange that is reported in the latter part of the post.

Update: In the post I refer to a paper by my colleague, Malcolm Rutherford: "The USDA Graduate School: Government Training in Statistics and Economics, 1921-1945". It's now published in the December 2011 issue of the Journal of the History of Economic Thought (vol. 33, no. 4, pp. 419-447). If you don't have online access to this journal, the Working Paper version of the paper is available here.

1. Dave:

You said in your 2011 post: “Because an estimator is a function of the random data, it's random itself.”

Let’s see. Take the case of the common sample mean. The formula, or the procedure, or estimator, is “sum the x’s and divide by the sample size”. With a set of x’s and sample size n one obtains the sample mean, the estimate. Take another set of x’s, repeat the procedure, and one obtains a new estimate of the sample mean. Repeat many times and one obtains the sampling distribution of the sample mean.

However, the procedure, “sum the x’s and divide by the sample size”, remains invariant from sample to sample. Nothing random about it. And that means that the procedure or estimator (X 'X)'X 'y remains invariant, but that the estimates, the b’s vary from sample to sample because the y’s vary from sample to sample. So the b’s are random variables, but the procedure, the estimator, is not.

1. Nope, the the estimates are "realized values" of the random function we call the estimator., Check any intro stats. book.

2. We are not communicating. I am concerned about the precision of the language we use. Agreed that the estimates are the realized values of the estimator.

But the function or the procedure or the forumla is not random; one of its arguments is, the random variable y. For the function or formula or procedure to be random, that is to shift around randomly, there would have to be some other argument, such as an intercept, that would move randomly. But the procedure, “sum the x’s and divide by the sample size”, remains the same for all realizations of the estimator. That is, the (X 'X) 'X ' never changes from realization to realization.

3. Apparently we're not. You need to read up on the distinction between an estimator and an estimate.

4. To be absolutely clear on this point we need to recognize that we are are talking about three different things here. Observational values (or observations) are only random before we know what they are - ie before we measure or collect them. In formulae we usually denote such observations with capital letters and refer to an estimator. When we measure/collect (or whatever) the observations and plug them into a formula we have an estimate. But the known values can be denoted by lower case letters or by real numbers. So, a formula with X's is an estimator if capital X's are used, but an estimate if lower case x's are used.

5. It looks as though we are debating the meaning of the word function. To me a function (of x say) is something that we can plot on a graph so its value will vary. So I think of an estimator as being a function of random variables and so is also a random variable. It is the formula for the function that doesn't change and so is not a random variable. A function of X will be a random variable if X is a random variable.

2. Dave, I neglected to mention that I indeed did that. I have a good collection of stats, math stat and econometric texts from my UG and graduate stats courses. What I find, no surprise, is a fair amount of variation in the treatment (dare I say random treatment?) in the definition, although always making clear the distinction between estimator and estimate. That is not the issue I raise, although you seem to think so.

A venerable text still in print is Hoel’s Introduction to Mathematical Statistics. Quoting from the 3rd edition, p. 58: “It is customary for some statisticians to use the word estimator for the function and the word estimate for the value of the function after the observational values have been inserted. Thus [sum of x’s over n] would be called the estimator of theta. Other statisticians, however, use the word estimate both for the function and its numerical value.”

While you use the first definition that Hoel gives, as do I, my focus has been on the latter part: “. . . the word estimate for the value of the function after the observational values have been inserted”.

Note that it is the observational values that are random, not the function, which is my point.

Of all the texts that I consulted, 10 in total, Henri Theil in his Principles of Econometrics (1971) seems to have the clearest definition presented in a numerical example on p. 87:

“Note that the numerical value 9.84 is called an estimate of mu, where as Xbar [the function of the random variable X1, . . . . Xn before the sample is specified numerically] is called an estimator of mu.”

Theil’s and Hoel’s definitions are nearly the same. My point all along is that you term the function as random, whereas I suggest that the function does not vary – its arguments or observational values do, and they are random. I am making a point about your description of the mathematics, not the statistics.

1. Now I understand your point, and I am in total agreement.

3. Dave,

I'd like to have a go at "what you can/can't say" about a CI and maybe you and/or Anonymous can comment.

Anonymous's distinction above between an "estimator" and an "estimate" is very helpful. Let's apply this to a confidence interval.

Notation (and apologies for the absence of subscripts):

D is a dataset in general, "before the sample is specified numerically" (Theil).

[ θL(D), θU(D) ] is an estimator: it is a CI for the parameter θ in general.

D0 is my actual dataset at hand.

[ θL(D0), θU(D0) ] is an estimate: it is the CI I calculate "after the observational values have been inserted" (Hoel).

Or, to use a culinary metaphor, D = ingredient list, [ θL(D), θU(D) ] = recipe, D0 = the actual ingredients I've picked up at the market, [ θL(D0), θU(D0) ] = the dish I've prepared.

Statement #1 (CI as "estimate"):

"I have a dataset D0, and with it I have calculated a 95% confidence interval [ θL(D0), θU(D0) ]. This means there is a 95% probability that θ lies in the interval."

Statement #2 (CI as "estimator"):

"I have a dataset D0, and with it I am planning to calculate a 95% confidence interval [ θL(D), θU(D) ]. This means there is a 95% probability that θ will lie in this interval."

Statement #1 is wrong but Statement #2 is fine.

Or, to continue with the culinary metaphor, the "confidence" relates to the recipe (how likely it is to generate a tasty meal), not the actual dish I've prepared. Either it's tasty or it isn't!

I hope I got that right. Would be interested to see what you think.

--Mark

1. Mark - I'm happy with Statement #2, as long as we understand that the notion of probability that you are using to get the 95% is "long-run relative frequency".
I like the culinary metaphor - thanks for this!

2. Actually, one of the motivations for setting out #2 in the way that I did was that I wanted to get away from "repeated sampling". If I'm trying to teach this to students, I think talking about repeated realizations would confuse them - I'm afraid they would say, what's the difference between #1 and #2 if both involve CIs "where observational values have been inserted"? And I'd have to explain something that isn't actually central.

What I wanted to get at in #1 vs. #2 was that the key difference is between "estimate" and "estimator". (Hat-tip again to Anonymous!) "95% confidence" applies to the latter, not to the former. The distinction holds even if there is only ever going to be one realization (one dataset). And lots of times in economics we do have only one realization (e.g., one macro dataset for country X in the Bretton Woods era).

Glad you like the culinary metaphor - it's growing on me too. Sadly, my actual culinary skills are less than impressive. The temptation to say that most of my output lies outside the tasty interval is near-irresistible...

--Mark

3. OK - that's fair enough!

4. Mark -- I too like the culinary example. What led me to Dave's post was a discussion on another stats blog (http://wmbriggs.com/) about Bayesian inference and CI's which prompted me to get more perspectives.

One of the things I realized in that and this discussion is that we do not typically draw distributions for the CI case when we teach the concept. In fact, in all the texts I still have, there is not one that does so, although I do not remember for those I tossed out. What is done instead is to present a table with 10 to 20 random samples drawn from a given distribution, computing the sample mean and the CI's for each, and showing how the true parameter is in these intervals 5 or 10% of the time. And then we make the statement that for the typical case of one sample that the true parameter is either in the interval or it is not. The students nearly always have a hard time with this. I suspect that this is because the verbal discussion is not usually translated into diagrams.

But we know from neuroscience that the computing power of the visual cortex is very large indeed, and thus the oft repeated statement that a picture is worth a thousand words.

I cannot draw the distributions here, but what may be clearer intuitively is to draw the distribution for the population, mark the X axis with the values of the random variable in that distribution, then select a sample AND plot the frequency diagram or distribution of that sample on the parent distribution diagram. Easy to show a sample distribution out near the tails that does not contain mu. Do this a few times with other samples, and then point to the diagram indicating that mu is in the CI or it is not, and that may be the students sample in practice, or on the job using stats.

I don’t know if there are any software programs that do this.