What if parallel trends are not so parallel?
Image courtesy of github.com/asheshrambachan/honestdid |
Academic's take
In difference-in-differences models, parallel trends are often treated as a make-or-break assumption, and the assumption is not even testable. Why not? One misconception is that comparing the treated and control units before treatment is a test of the assumption. In reality, the assumption is not limited to the pretreatment period but for the entire counterfactual. Comparing trends in the pretreatment period is only a plausibility test. That is, "parallel pretreatment trends" and "parallel counterfactual trends" are not the same. We can observe the former, but we need the latter. So, what we have is not what we want, as is usually the case. Even testing for parallel pretreatment trends alone is tricky because of potential power issues (evidence of absence is not the absence of evidence!) and other reasons, including a potential sensitivity to the choice of functional form / transformations (Roth and Sant'Anna, 2023).
Parallel trends is a difficult assumption to meet. Most importantly, treated and control units could just as easily trend in different directions for many reasons. In the absence of an ideal randomization scenario, this may be more likely than observing parallel trends. Treatment may also be correlated with some time-varying confounders, which may cause treated units to deviate from control units after treatment.
A number of studies have looked at this issue. My favorite so far is the work by Ashesh Rambachan and Jonathan Roth, and I am glad to see it finally published. Rambachan and Roth (2023) basically ask, why don't we make a few more assumptions about the trends and be satisfied with a partial identification?
I like this because it is simple and clear. The method makes assumptions about the difference in trends observed in the pretreatment period: assume the pre-treatment difference continues as is or assume it deviates within a range. Based on the additional assumptions about the violation, the method tests the sensitivity of an estimated causal treatment effect to some violation of the parallel trends assumption.
In essence, the method answers the question: How much deviation from parallel trends must there be for the estimated causal effect to disappear? Another way to put it is this: After accounting for the difference in trends that is assumed to continue from the pre-treatment to post-treatment period, is the remaining causal effect still different from zero? So, how do we account for the counterfactual difference in trends that is not due to the treatment? The authors offer several possibilities:
- The post-treatment violation of parallel trends is assumed to be a fixed constant greater than the maximum violation of parallel trends in pre-treatment (i.e., difference in pre-treatment trends)
- The post-treatment violation of parallel trends follows a linear extrapolation of the pre-treatment violation with some deviation (shown as a change in the slope by M in the plot above)
The solution is essentially like a constrained optimization problem, with no guarantee of correct inference, but it is very useful and produces robust confidence intervals. The authors provide an R package here.
Director's cut
In business, running tests/experiments is expensive. Imagine changing store layouts, re-pricing hundreds if not thousands of items, removing items from the assortment and replacing them with other items, running promotions and discounts, or installing self-checkout lanes just to see if they work. This is why every test has a budget (and a set of predetermined stopping conditions). In most cases, budget constraints require tests to be run longer (vs. in more stores or on more products) to collect the number of observations needed for analysis. The test vs. control stores can't always be randomly selected because randomization happens at the market level (to eliminate spillover effects etc.). The tests end up producing data that is closer to observational data than experimental data. So, measuring the treatment effect requires some serious matching effort to construct a counterfactual and compare trends.
Finding test and control stores that meet all the matching criteria and that have parallel trends is a challenge. A store in Denver, CO may have the same customer demographics as a store in Austin, TX. However, these two stores may not necessarily follow the same trend when it comes to (say) outdoor apparel sales because the seasons differ. If the parallel trends assumption is violated, the validity of the results is questionable and generally not acted upon without repeating expensive tests. The paper shared by the academic offers an alternative. By de-trending the difference between test and control stores and removing the bias to some extent, insights can still be gathered even if the parallel trends assumption is violated. For example, even if Austin's sales trend differs from Denver's before a promotion, because the hiking season starts earlier, we can still gain some understanding of the effect of a spring promotion.
Implications for data centricity
References
- Rambachan, A., & Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, rdad018.
- Roth, J., & Sant'Anna, P. H. (2023). When is parallel trends sensitive to functional form? Econometrica, 91(2), 737-747.