Causal Inference

Random notes on causal inference.

Panel Data

Difference in Difference

We can impute the potential outcome of the treatment group by adding the difference between \(T_{post} = 0\) and \(T_{post} = 1\) in control group to \(T_{post}=0\) in the treatment group.

\[\begin{split}E[Y_{(0)} | D = 1, T_{post} = 1] = & E[Y | D = 1, T_{post} = 0] + \\ & (E[Y | D = 0, T_{post} = 1] - E[Y | D = 0, T_{post} = 0])\end{split}\]

The imputation is quite intuitive as we can consider \(E[Y | D = 1, T_{post} = 0]\) as a baseline, and see the difference in control group as a trend that’s universal to both of the treatment group and the control group.

The \(ATT\) is defined as:

\[\begin{split}ATT &= E[Y_{it,(1)} - Y_{it,(0)} | D = 1, T_{post} = 1] \\ &= E[Y_{it,(1)} | D = 1, T_{post} = 1] - E[Y_{it,(0)} | D = 1, T_{post} = 1] \\ &= E[Y | D = 1, T_{post} = 1] - E[Y_{it,(0)} | D = 1, T_{post} = 1]\end{split}\]

The first term is correct since \(Y_{(1)} = Y\) in treatment group after treatment, while the second term is the quantity we are trying to impute. Substituting the imputed \(E[Y_{(0)} | D = 1, T_{post} = 1]\) into ATT we can get a nice representation of difference-in-difference:

\[\begin{split}ATT &= (E[Y | D = 1, T_{post} = 1] - E[Y | D = 1, T_{post} = 0]) - \\ &= (E[Y | D = 0, T_{post} = 1] - E[Y | D = 0, T_{post} = 0])\end{split}\]