Thursday, December 5, 2013

Econometrics and "Big Data"

In this age of "big data" there's a whole new language that econometricians need to learn. Its origins are somewhat diverse - the fields of statistics, data-mining, machine learning, and that nebulous area called "data science".

What do you know about such things as:
  • Decision trees 
  • Support vector machines
  • Neural nets 
  • Deep learning
  • Classification and regression trees
  • Random forests
  • Penalized regression (e.g., the lasso, lars, and elastic nets)
  • Boosting
  • Bagging
  • Spike and slab regression?

Probably not enough!

If you want some motivation to rectify things, a recent paper by Hal Varian will do the trick. It's titled, "Big Data: New Tricks for Econometrics", and you can download it from here. Hal provides an extremely readable introduction to several of these topics.

He also offers a valuable piece of advice:
"I believe that these methods have a lot to offer and should be more widely known and used by economists. In fact, my standard advice to graduate students these days is 'go to the computer science department and take a class in machine learning'."
Interestingly, my son (a computer science grad.) "audited" my classes on Bayesian econometrics when he was taking machine learning courses. He assured me that this was worthwhile - and I think he meant it! Apparently there's the potential for synergies in both directions.

© 2013, David E. Giles


  1. The pdf file on Hal Varian's site is corrupt and cannot be opened.

  2. Thanks for linking to the Hal Varian paper. I can't wait to read it over our upcoming break. I actually proposed teaching a data mining/data science course for my department last year. I did blog some similar, although much more long winded a while back: