Tuesday, April 26, 2011

Drawing Inferences From Very Large Data-Sets

It's not that long ago that one of the biggest complaints to be heard from applied econometricians was that there were never enough data of sufficient quality to enable us to do our job well. I'm thinking back to a time when I lived in a country that had only annual national accounts, with the associated publication delays. Do you remember when monthly data were hard to come by; and when surveys were patchy in quality, or simply not available to academic researchers? As for longitudinal studies - well, they were just something to dream about!

Now, of course that situation has changed dramatically. The availability, quality and timeliness of economic data are all light years away from what a lot of us were brought up on. Just think about our ability, now, to access vast cross-sectional and panel data-sets, electronically, from our laptops via wireless internet.

Obviously things have changed for the better, in terms of us being able to estimate our models and provide accurate policy advice. Right? Well, before we get too complacent, let's think a bit about what this flood of data means for the way in which we interpret our econometric/statistical results. Could all of these data possibly be too much of a good thing?