Sonia Karami and I just had our paper Treatment Effects in Interactive Fixed Effects Models with a Small Number of Time Periods accepted at Journal of Econometrics.
One of the things that I have been very interested in over the past couple of years is trying to identify treatment effect parameters when (i) parallel trends assumptions are violated and (ii) the number of time periods is “small”.
Parallel trends assumptions are very closely related to the following model for untreated potential outcomes:
\[ Y_{it}(0) = \theta_t + \eta_i + U_{it} \] where \(\theta_t\) is a time fixed effect, \(\eta_i\) is an individual fixed effect, and \(U_{it}\) are idiosyncratic time varying unobservables.
But the additive separability between the time-period and unit fixed effects is important here.
cite other IFE papers
cite manski, roth, and try to connect motivation
A lot of my research has involved identifying treatment effect parameters in a difference in differences (DID) framework. For DID, the main identifying assumption is the parallel trends assumption:
Parallel Trends Assumption \[ \mathbb{E}[\Delta Y_t(0) | D=1] = \mathbb{E}[\Delta Y_t(0) | D=0] \]
As discussed above, parallel trends assumptions are very closely related to the following model for untreated potential outcomes:
\[ Y_{it}(0) = \theta_t + \eta_i + U_{it} \] where \(\theta_t\) is a time fixed effect, \(\eta_i\) is an individual fixed effect, and \(U_{it}\) are idiosyncratic time varying unobservables.
An extended version of the parallel trends assumption is the following conditional parallel trends assumption
Conditional Parallel Trends Assumption \[ \mathbb{E}[\Delta Y_t(0) | \tilde{Z}, D=1] = \mathbb{E}[\Delta Y_t(0) | \tilde{Z}, D=0] \] which says that parallel trends holds after conditioning on \(\tilde{Z}\). To give an example, the application in our paper is about job displacement and the outcome is an individual’s earnings. It seems likely that the path of untreated potential outcomes (how outcomes would change over time if an individual were not displaced from their job) likely depends on a person’s education, demographic characteristics, etc. If these are distributed differently across displaced workers and non-displaced workers (which is also likely), then conditioning on these sorts of variables before invoking parallel trends will be important.
This conditional parallel trends assumption is closely related to the following model for untreated potential outcomes \[ Y_{it}(0) = g_t(\tilde{Z}_i) + \eta_i + U_{it} \] where \(g_t\) is a nonparametric, time-varying function of \(\tilde{Z}\), and \(\eta_i\) and \(U_{it}\) are the same as before (the important thing here is the additive separability of \(\eta_i\) which allows for it to be differenced out). It’s common to impose linearity for \(g_t\) to get to \[ Y_{it}(0) = \tilde{Z}_i'\tilde{\delta}_t + \eta_i + U_{it} \] where we take \(\tilde{Z}_i\) to include an intercept so that the time fixed effect is absorbed into \(\tilde{Z}_i'\tilde{\delta}_{t}\) from here on out.
I have been a bit purposely vague about \(\tilde{Z}\) above. What variables need to be conditioned on though is largely a theoretical exercise. The variable that I mentioned above (education and/or demographic characteristics) are commonly observed in many datasets, but one might also think that parallel trends only holds after additionally conditioning on “ability” which is unlikely to be observed in most data.
Let’s partition \(\tilde{Z} = (Z,\lambda)\) where \(Z\) corresponds to the observed components of \(\tilde{Z}\) and \(\lambda\) corresponds to the unobserved components of \(\tilde{Z}\). Similarly, let’s partition \(\tilde{\delta}_t = (\delta_t, F_t)\) where \(\delta_t\) corresponds to the elements in \(Z\) and \(F_t\) corresponds to the elements in \(\lambda\). Plugging this back into the model for untreated potential outcomes above yields \[
Y_{it}(0) = Z_i'\delta_t + \lambda_i'F_t + \eta_i + U_{it}
\] This is an interactive fixed effects model for untreated potential outcomes!
Even if we like the interactive fixed effects model for untreated potential outcomes, it is still not clear if we can recover any causal effect parameters of interest under palatable identifying assumptions.
For one thing, like DID, we’d like to identify causal effect parameters when the number of time periods is small, and much of the interactive fixed effects literature involves arguments where the number of time periods goes to infinity.
To make things concrete, let’s consider the case with 3 time periods: \(t^*\), \(t^*-1\), and \(t^*-2\). And let’s suppose that no one is treated until the last period. We also define \(D_i\) as a variable that it is equal to one for individuals in the treated group (i.e., that become treated in the last period) and is equal to 0 otherwise. Like most of the literature on treatment effects with panel data, we’ll target identifying the average treatment effect on the treated (ATT) which is given by \[ ATT = \mathbb{E}[Y_{t^*}(1) - Y_{t^*}(0) | D=1] \] which is the difference between treated and untreated potential outcomes on average among individuals in the treated group. Our main identification challenge is therefore to recover \(\mathbb{E}[Y_{t^*}(0)|D=1]\).
Towards this end, we use a “quasi-differencing” approach to difference out \(\eta_i\) and \(\lambda_i\) (see, for example, Holtz-Eakin, Newey, and Rosen (1988) and Ahn, Lee, and Schmidt (2013)); that is,
\[ Y_{it^*-1}(0) - Y_{it^*-2}(0) = \lambda_i \big(F_{t^*-1} - F_{t^*-2}\big) + Z_i' \big(\delta_{t^*-1} - \delta_{t^*-2} \big) + U_{it^*-1} - U_{it^*-2} \]
which implies
\[ \lambda_i = \Big( \big( Y_{it^*-1}(0) - Y_{it^*-2}(0) \big) - Z_i' \big(\delta_{t^*-1} - \delta_{t^*-2}\big) - \big(U_{it^*-1}-U_{it^*-2}\big) \Big) \Big/ (F_{t^*-1} - F_{t^*-2}) \]
Similarly, \[ \begin{aligned} Y_{it}(0) - Y_{it^*-2}(0) &= \lambda_i(F_t - F_{t^*-2}) + Z_i'(\delta_t - \delta_{t^*-2}) + U_{it} - U_{it^*-2} \nonumber \\ &= Z_i'\delta^*_t + F^*_t (Y_{it^*-1} - Y_{it^*-2}) + V_{it} \end{aligned} \]
We have a number of extensions to these kind of results in the paper.
Pretty much the same arguments apply in cases where there are more than one interactive fixed effect. The main additional requirement is that, for each interactive fixed effect, we need at least one covariate whose effect on untreated potential outcomes is time invariant
We spend a lot of time thinking about practical issues such as weak instruments, not enough covariates with time invariant effects, and tests for covariates actually having time invariant effects. Except in one or two very pernicious cases, we think that our approach should either work or that one would be able to successfully detect that it is not working.