Friday, September 9, 2016

Spreadsheet Errors

Five years ago I wrote a post titled, "Beware of Econometricians Bearing Spreadsheets". 

The take-away message from that post was simple: there's considerable, well-documented, evidence that spreadsheets are very, very, dangerous when it comes to statistical calculations. That is, if you care about getting the right answers!

Read that post, and the associated references, and you'll see what I mean.

(You might also ask yourself, why would I pay big bucks for commercial software that is of questionable quality when I can use high-quality statistical software such as R, for free?)

This week, a piece in The Economist looks at the shocking record of publications in genomics that fall prey to spreadsheet errors. It's a sorry tale, to be sure. I strongly recommend that you take a look.

Yes, any software can be mis-used. Anyone can make a mistake. We all know that. However, it's not a good situation when a careful and well-informed researcher ends up making blunders just because the software they trust simply isn't up to snuff!  

  1. The "errors" reported by the genomics study are due to researcher laziness, not to the software per se. Any veteran Excel user has experienced the annoying General default format, and should know about changing the format to Text. I suspect these lists of genome names were copied and pasted into Excel without changing the format, and the user simply didn't bother to proof the list. I'm not a big fan of Excel, but this kind of thing is very different from calculation errors due to second-rate algorithms, or even the problems caused by complicated formulas.

    1. Maybe - but you should still check my earlier post and the well-documented algorithmic flaws in Excel.

    2. I never understood why highly educated people would use excel for anything besides perhaps very basic tasks. In fact, any program that is meant to be run by user interface and not centered around writing code is a very bad choice. STATA and MATLAB are costly, I understand, but the amount of time you save using them for publication purposes is gigantic. Be it just out of laziness for table and plot formats that you have to code just once!

      If not, as you said, why not R? It's free and there even exists forums, videos and courses, either cheap or free, if you need to learn it. Damn it, there is no excuse for using excel or something similar in this day and age!

      Btw, how is R for microeconometrics? From what I gather, macroeconomists use it more often than microeconomists who seem to prefer STATA.

