Sunday, April 28, 2013

Data Quality is Paramount

Yesterday, in a post on the Worthwhile Canadian Initiative, Frances Woolley rightly drew attention to some rather disturbing issues associated with the upcoming release of the 2011 National Household Survey (NHS), by Statistics Canada. In a nutshell, she asks the question, "How can we be sure that the NHS information about the religious beliefs of Canadians is accuarate?"

Recently, I made the comment: Data - the econometrician's lifeblood! Can't function without it." I wish I'd been more specific, and said "reliable data."

Data quality, and the timeliness of its release, is something that affects us all. And the impact isn't always a positive one. Neither is the problem limited to survey data, of the type that Frances was discussing. It applies equally to time-series data.

Most practising economists are broadly aware of some of the pitfalls associated with working with data, the quality of which may be "mixed" or questionable. One matter for concern, though, is the lack of such awareness among some of our students who routinely use "official" data without questioning its quality, or its applicability to the questions that they are trying to address.

Among the matters that we should be telling our students more about are:

  • Data get revised! All of the time! Don't assume that those GDP figures are going to stay that way.
  • Data "disappear"! Don't assume that your favourite series published by your country's official statistical agency is always going to be available in that form. These agencies have a nasty habit of "discontinuing" time-series data - often without much warning!
  • Data definitions change! Read the footnoted and the fine-print associated with published data. It's important to know if there are "breaks" in a time-series resulting from changes in the way it has been defined. Sometimes these breaks are unavoidable, but on other occasions they are just plain irritating! Either way, they affect your analysis.
  • Data are based on estimates! O.K., not all of them - but a lot more than you may think. It's a common fallacy among students that core macroeconomic data are somehow "exact". They're not!
The bottom line(s):
  1. The quality of your data is at least as important as the amount of data you have.
  2. Be as concerned about understanding your data, and its limitations, as you are about understanding the statistical/econometric tools that you intend to use.

© 2013, David E. Giles


  1. Spot on! I have the following quote from Griliches (AER, 1985) in my grad lecture notes:
    "Economic data tend to be collected (or often more correctly reported.) by .rms and persons who are not professional observers and who do not have any stake in the correctness and precision of the observations they report... The encounters between econometricians and data are frustrating and ultimately unsatisfactory, both because econometricians want too much from the data and hence tend to be disappointed by the answers, and because the data are incomplete and imperfect... [M]easurement errors which tend to cancel out when averaged over thousands or even millions of respondents, loom much larger when the individual is the unit of analysis... Thus any serious data analysis has to consider at least two data generation components: the economic behavior model describing the stimulus-response behavior of the economic actors and the measurement model describing how and when this behavior was
    recorded and summarized. While it is usual to focus our attention on the former, a complete analysis must consider them both."

    While geared towards micro, macro needs to worry about measurement error just as much for the reasons you give.

  2. Or as Josiah Stamp said:
    "The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases."

  3. In times of government austerity, statistics collection is often on the chopping block. Cost savings by discontinuing entire series or decreasing sample size are seen as easy ways to save money. It is important to monitor statistics agencies and sites and let your elected officials know how those budget cuts affect your work and the public.