Recently, it was my distinct pleasure to review a first-class book by David Harville, titled Linear Models and the Relevant Distributions and Matrix Algebra.
Here is what I had to say:
"A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut deﬁnitions and interpretations that are simply wrong, sometimes disastrously so - and yet these misinterpretations dominate much of the scientiﬁc literature."
"The short answer is: None. They are both concerned with the same question: how do we learn from data?
But a more nuanced view reveals that there are differences due to historical and sociological reasons..........
If I had to summarize the main difference between the two fields I would say:
Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.
Machine Learning emphasizes high dimensional prediction problems.
But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example:
Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.
Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting.
But the differences become blurrier all the time........
There are also differences in terminology. Here are some examples:
Statistics Machine Learning
Data point Example/Instance
Regression Supervised Learning
Classification Supervised Learning
Overall, the the two fields are blending together more and more and I think this is a good thing."As I said, this is only a rough answer - and it's by no means a comprehensive one.
Efron: "One of the reasons I came to Stanford was because of its humor magazine. I wrote a humor column at Caltech, and I always wanted to write for a humor magazine. Stanford had a great humor magazine, The Chaparral. The first few months I was there, the editor literally went crazy and had to be hospitalized, and so I became editor. For one issue we did a parody of Playboy and it went a little too far. I was expelled from school, ..... I went away for 6 months and then I came back. That was by far the most famous I’ve ever been."Referring to his seminal paper (Efron, 1979):
Tibshirani: "It was sent to the Annals. What kind of reception did it get?"
Efron: "Rupert Miller was the editor of the Annals at the time. I submitted what was the Rietz lecture, and it got turned down. The associate editor, who will remain nameless, said it that didn’t have any theorems in it. So, I put some theorems in at the end and put a lot of pressure on Rupert, and he finally published it."I guess there's still hope for the rest of us!
"I saw your post on long run data and thought you might be interested in a couple of other long-run datasets for your research. If I remember correctly you are familiar with the GDP/GNI series, Long-run Real Income Estimates. I also added the long-run Bank of Canada commodity price series that go back to 1870 to it. There is also a dataset for the provinces with estimates going back to 1950 or 1926 depending on the variable: Long-run Provincial and Territorial Data ."