Sunday, January 13, 2019

Machine Learning & Econometrics

What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)?

Those are big questions, but I think that they're ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I'd take a long, hard look at what's going on in ML.

Here's a very rough answer - it comes from a post by Larry Wasserman on his (now defunct) blog, Normal Deviate:
"The short answer is: None. They are both concerned with the same question: how do we learn from data?
But a more nuanced view reveals that there are differences due to historical and sociological reasons.......... 
If I had to summarize the main difference between the two fields I would say: 
Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems. 
Machine Learning emphasizes high dimensional prediction problems. 
But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example: 
Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.
Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting. 
But the differences become blurrier all the time........ 
There are also differences in terminology. Here are some examples:
Statistics       Machine Learning
———————————–
Estimation        Learning
Classifier          Hypothesis
Data point         Example/Instance
Regression        Supervised Learning
Classification    Supervised Learning
Covariate          Feature
Response          Label 
Overall, the the two fields are blending together more and more and I think this is a good thing."
As I said, this is only a rough answer - and it's by no means a comprehensive one.

For an econometrician's perspective on all of this you can't do better that to take a look at Frank Dielbold's blog, No Hesitations. If you follow up on his posts with the label "Machine Learning" - and I suggest that you do - then you'll find 36 of them (at the time of writing).

If (legitimately) free books are your thing, then you'll find some great suggestions for reading more about the Machine Learning / Data Science field(s) on the KDnuggets website - specifically, here in 2017 and here in 2018.

Finally, I was pleased that the recent ASSA Meetings (ASSA2019) included an important contribution by Susan Athey (Stanford), titled "The Impact of Machine Learning on Econometrics and Economics". The title page for Susan's presentation contains three important links to other papers and a webcast.

Have fun!

© 2019, David E. Giles