Sunday, February 10, 2019

Tuesday, February 5, 2019

Misinterpreting Tests, P-Values, Confidence Intervals & Power

There are so many things in statistics (and hence in econometrics) that are easily, and frequently, misinterpreted. Two really obvious examples are p-values and confidence intervals.

I've devoted some space in earlier posts to each of these concepts, and their mis-use. For instance, in the case of p-values, see the posts here and here; and for confidence intervals, see here and here.

Today I was reading a great paper by Greenland et al. (2016) that deals with some common misconceptions and misinterpretations that arise not only with p-values and confidence intervals, but also with statistical tests in general and the "power" of such tests. These comments by the authors in the abstract for their paper sets the tone of what's to follow rather nicely:
"A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so - and yet these misinterpretations dominate much of the scientific literature." 
The paper then goes through various common interpretations of the four concepts in question, and systematically demolishes them!

The paper is extremely readable and informative. Every econometrics student, and most applied econometricians, would benefit from taking a look!


Greenland, S., S. J. Senn, K. R. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, & D. G. Altman, 2016. Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337-350.  

© 2019, David E. Giles

Sunday, February 3, 2019

February Reading

Now that Groundhog Day is behind us, perhaps we can focus on catching up on our reading?
  • Deboulets, L. D. D., 2018. A review on variable selection in regression. Econometrics, 6(4), 45.
  • Efron, B. & C. Morris, 1977. Stein's paradox in statistics. Scientific American, 236(5), 119-127.
  • Khan, W. M. & A. u I. Khan, 2018. Most stringent test of independence for time series. Communications in Statistics - Simulation and Computation, online.
  • Pedroni, P., 2018. Panel cointegration techniques and open challenges. Forthcoming in Panel Data Econometrics, Vol. 1: Theory, Elsevier.
  • Steel, M. F., J., 2018. Model averaging and its use in economics. MPRA Paper No. 90110.
  • Tay, A. S. & K. F. Wallis, 2000. Density forecasting: A survey. Journal of Forecasting, 19, 235-254.
© 2019, David E. Giles

Sunday, January 13, 2019

Machine Learning & Econometrics

What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)?

Those are big questions, but I think that they're ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I'd take a long, hard look at what's going on in ML.

Here's a very rough answer - it comes from a post by Larry Wasserman on his (now defunct) blog, Normal Deviate:
"The short answer is: None. They are both concerned with the same question: how do we learn from data?
But a more nuanced view reveals that there are differences due to historical and sociological reasons.......... 
If I had to summarize the main difference between the two fields I would say: 
Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems. 
Machine Learning emphasizes high dimensional prediction problems. 
But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example: 
Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.
Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting. 
But the differences become blurrier all the time........ 
There are also differences in terminology. Here are some examples:
Statistics       Machine Learning
Estimation        Learning
Classifier          Hypothesis
Data point         Example/Instance
Regression        Supervised Learning
Classification    Supervised Learning
Covariate          Feature
Response          Label 
Overall, the the two fields are blending together more and more and I think this is a good thing."
As I said, this is only a rough answer - and it's by no means a comprehensive one.

For an econometrician's perspective on all of this you can't do better that to take a look at Frank Dielbold's blog, No Hesitations. If you follow up on his posts with the label "Machine Learning" - and I suggest that you do - then you'll find 36 of them (at the time of writing).

If (legitimately) free books are your thing, then you'll find some great suggestions for reading more about the Machine Learning / Data Science field(s) on the KDnuggets website - specifically, here in 2017 and here in 2018.

Finally, I was pleased that the recent ASSA Meetings (ASSA2019) included an important contribution by Susan Athey (Stanford), titled "The Impact of Machine Learning on Econometrics and Economics". The title page for Susan's presentation contains three important links to other papers and a webcast.

Have fun!

© 2019, David E. Giles

Friday, January 11, 2019

Shout-out for Mischa Fisher

One of my former grad. students, Mischa Fisher, is currently Chief Economist and Advisor to the Governor of the State of Illinois. In this role he has oversight of a number of State agencies dealing with economics and data science.

This week, he had a really nice post on the blog. It's titled "10 Data Science Pitfalls to Avoid".

Mischa is very knowledgeable, and he writes extremely well. I strongly recommend that you take a look at his piece.

© 2019, David E. Giles

Monday, January 7, 2019

Bradley Efron and the Bootstrap

Econometricians make extensive use of various forms of "The Bootstrap", thanks to Bradley (Brad) Efron's pioneering work.

I've posted about the history of the bootstrap previously - e.g., here, and here.

You probably know by now that Brad was awarded The International Prize in Statistics last November - this was only the second time that this prize has been awarded. It's difficult to think of a more deserving recipient.

If you want to read an excellent account of Brad's work, and how the bootstrap came to be, I recommend the 2003 piece by Susan Holmes, Carl Morris, and Rob Tibshirani.

There are some fascinating snippets in this conversation/interview, including:
Efron: "One of the reasons I came to Stanford was because of its humor magazine. I wrote a humor column at Caltech, and I always wanted to write for a humor magazine. Stanford had a great humor magazine, The Chaparral. The first few months I was there, the editor literally went crazy and had to be hospitalized, and so I became editor. For one issue we did a parody of Playboy and it went a little too far. I was expelled from school, ..... I went away for 6 months and then I came back. That was by far the most famous I’ve ever been." 
 Referring to his seminal paper (Efron, 1979):
Tibshirani: "It was sent to the Annals. What kind of reception did it get?" 
Efron: "Rupert Miller was the editor of the Annals at the time. I submitted what was the Rietz lecture, and it got turned down. The associate editor, who will remain nameless, said it that didn’t have any theorems in it. So, I put some theorems in at the end and put a lot of pressure on Rupert, and he finally published it."
I guess there's still hope for the rest of us!


Efron, B., 1979. Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1-26.

Holmes, S., C. Morris, & R. Tibshirani, 2003. Bradley Efron: A conversation with good friends. Statistical Science, 18, 268-281.

© 2019, David E. Giles

Tuesday, January 1, 2019

New Year Reading Suggestions for 2019

With a new year upon us, it's time to keep up with new developments -
  • Basu, D., 2018. Can we determine the direction of omitted variable bias of OLS estimators? Working Paper 2018-16, Department of Economics, University of Massachusetts, Amherst.
  • Jiang, B., Y. Lu, & J. Y. Park, 2018. Testing for stationarity at high frequency. Working Paper 2018-9, Department of Economics, University of Sydney. 
  • Psaradakis, Z. & M. Vavra, 2018. Normality tests for dependent data: Large-sample and bootstrap approaches. Communications in Statistics - Simulation and Computation, online.
  • Spanos, A., 2018. Near-collinearity in linear regression revisited: The numerical vs. the statistical perspective. Communications in Statistics - Theory and Methods, online.
  • Thorsrud, L. A., 2018. Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business Economics and Statistics, online. (Working Paper version.)
  • Zhang, J., 2018. The mean relative entropy: An invariant measure of estimation error. American Statistician, online.
© 2019, David E. Giles

Sunday, December 2, 2018

December Reading for Econometricians

My suggestions for papers to read during December:

© 2018, David E. Giles

Tuesday, November 27, 2018

More Long-Run Canadian Economic Data

I was delighted to hear recently from former grad. student, Ryan Macdonald, who has worked at Statistics Canada for some years now. Ryan has been kind enough to draw my attention to all sorts of interesting items from time to time (e.g., see my earlier posts, here and here).

I always appreciate hearing from him.

His latest email was prompted by my post, A New Canadian macroeconomic Database.

Ryan wrote:
"I saw your post on long run data and thought you might be interested in a couple of other long-run datasets for your research.  If I remember correctly you are familiar with the GDP/GNI series, Long-run Real Income EstimatesI also added the long-run Bank of Canada commodity price series that go back to 1870 to it.  There is also a dataset for the provinces with estimates going back to 1950 or 1926 depending on the variable: Long-run Provincial and Territorial Data ."
Thanks for this information, Ryan.This will be very helpful, and I'd be more than happy to publicize any further such developments.

© 2018, David E. Giles

Thursday, November 22, 2018

A New Canadian Macroeconomic Database

Anyone who's undertaken empirical macroeconomic research relating to Canada will know that there are some serious data challenges that have to be surmounted.

In particular, getting access to long-term, continuous, time series isn't as easy as you might expect.

Statistics Canada has been criticized frequently over the years by researchers who find that crucial economic series are suddenly "discontinued", or are re-defined in ways that make it extremely difficult to splice the pieces together into one meaningful time-series.

In recognition of these issues, a number of efforts have been made to provide Canadian economic data in forms that researchers need. These include, for instance, Boivin et al. (2010), Bedock and Stevanovic (2107), and Stephen Gordon's on-going "Project Link".

Thanks to Olivier Fortin-Gagnon, Maxime Leroux, Dalibor Stevanovic, &and Stéphane Suprenant we now have an impressive addition to the available long-term Canadian time-series data. Their 2018 working paper, "A Large Canadian Database for Macroeconomic Analysis", discusses their new database and illustrates its usefulness in a variety of ways.

Here's the abstract: