Saturday, January 9, 2016

Difference-in-Differences With Missing Data

This brief post is a "shout out" for  Irene Botusaru (Economics, Simon Fraser University) who gave a great seminar in our department yesterday.

The paper that she presented (co-authored with Federico Guitierrez), is titled "Difference-in- Differences When the Treatment Status is Observed in Only One Period". So, the title of this post is a bit of an abbreviation of what the paper is really about.

When we conduct DID analysis, we need to be able to classify information about the behaviour/characteristics of survey respondents into a 4-way matrix. Specifically we need to be able to observe the respondents before and after a "treatment"; and in each case we need to know which respondents were treated, and which ones were not.

Usually, a true panel of data, observed at two or more time-periods, facilitates this.

However, what if we simply have repeated cross-sections of data, taken at different time-periods? In this case we aren't necessarily observing exactly the same respondents when we look at the cross-sections for two different time-periods. Typically, in the cross-section after the treatment we'll know which respondents were treated and which ones weren't. However, there will be no way of partitioning the respondents in the pre-treatment cross-section  into "subsequently treated" and "not treated" groups.

Two of the four cells in the matrix of information that we need will be missing, so conventional DID can't be performed.

This is the problem that Irene and Federico consider.

A natural response is introduce some sort of proxy variable(s) to deal with the missing data, and of course this will introduce an estimation bias, even asymptotically. This paper basically takes this approach. The result is a GMM estimation strategy, together with a test that the underlying assumptions are satisfied.

This is a really nice paper - well motivated, technically solid, and with a nice empirical example and application. I urge you to take a look at it if DID is in your econometrics tool-kit (and even if it's not!)

I'm sure that Irene and Federico would appreciate hearing about situations where you've encountered this missing data problem, and how you've responded to it.


© 2016, David E. Giles

1 comment:

  1. I have more than three periods. For example, 1980, 1981, 1982, and 1983.
    I have six groups: A, B, C, D, E, F. I want to check the effectiveness of minimum drinking age. For example, group A has 16 years in before and after the policy. Group B increases the age from 16 to 17. Group C increases the age from 16 to 18. Group D has changed the age at 17. Group E increases the age from 17 to 18. Group F has 18 years before and after the policy.

    The data is panel data.
    Is there anyone who can help me to design the DID model in such case?
    How many treatment effects I need to find?

    Thanks in advanced.

    ReplyDelete