Synthetic control method in the wild

Image courtesy of dataingovernment.blog.gov.uk

Academic's take

Synthetic data is increasingly popular across the board. From deep learning to econometrics, artificially generated data is used for a number of purposes. In deep learning, one such use case is to train neural network models using artificially generated data. In econometrics, synthetic data has recently found another use case: creating control groups in observational studies for causal inference.

Synthetic data + control groups: Synthetic controls. This is too generic of a name for a specific method. In this post, I will focus on the synthetic control approach developed by Abadie et al. (2010, 2014) and previously Abadie & Gardeazabal (2003). Athey and Imbens (2017) describe Abadie et al.'s work as "arguably the most important innovation in the policy evaluation literature in the last 15 years."

Why is it needed?

Measuring the causal effect of a treatment requires a counterfactual (what would've happened if the treatment did not occur) to compare with what is observed following the treatment (what really happened). The causal effect is the difference between the two (what really happened with the treatment minus what would've happened w/o the treatment). For example, a price promotion is applied to a product and the resulting demand is observed. Let's say the demand increased from 100 to 110 units. What would have been the demand if the promotion was not applied? We will never know this unless we can observe two parallel universes. In the absence of a direct observation of two such outcomes, let's say we use another product free of the promotion to serve as the counterfactual. To find a truer causal effect, these two products must not differ in any other way with respect to the demand except for the promotion. This is difficult to establish (factors other than the promotion can easily affect the demand).

What is it?

Formally developed in early 2000s and articulated further in Abadie et al. (2010, 2014) and Abadie (2021), synthetic controls offer a way to generate an artificial control unit using multiple actual control units. This artificial, or synthetic, control is shown to serve well as a counterfactual even if none of the actual control units involved in the synthesis alone serves as a counterfactual for the treated unit.

How is it done?

Three steps to synthesize the control:

  1. Choose predictor variables that explain the outcome in the pre-treatment period (that are observed for both untreated units and the treated unit). Include some lagged values of the outcome variable in the predictor set. These variables will be time-invariant (Calculate and use means if the predictors are time series).
  2. Identify possible untreated units to synthesize into a control to serve as a counterfactual.
  3. Use an objective function to find a weight for each untreated unit and a weight for each predictor. Let $\textbf{X}_1$ be a $(R+T_0)\times 1$ vector of pre-treatment characteristics for the treated unit. Similarly, $\textbf{X}_0$ is a $(R+T_0) \times J$ matrix of the same variables for the untreated units. Abadie et al. (2010) minimizes the following distance measure:
    $$||\textbf{X}_1-\textbf{X}_0\textbf{W}||_{\textbf{V}} = \sqrt{(\textbf{X}_1-\textbf{X}_0\textbf{W})'\textbf{V}(\textbf{X}_1-\textbf{X}_0\textbf{W})} \qquad\qquad$$
    The calculation produces two outputs: matrix $\textbf{V}$, which forms the the weights on the predictor variables (so they are included in the minimization differently depending on how well they predict the pre-treatment outcome), and an array of weights $\textbf{W}$ for the control units.
Four steps to validate the control:
  1. Evaluate the pre-treatment period goodness of fit. This is an evaluation of how closely the synthesized control follows the treated unit before the treatment. A time series plot is a good start.
  2. Conduct placebo tests. The method can be repeated for every unit in the pool exactly as it is done for the treated unit. This would generate placebo synthetic controls. Plot all the time series. Our expectation is the placebos don't behave similar to the treated unit (i.e., the post-treatment difference between the treated unit and its synthetic control differs from that in the placebos).
  3. Calculate a p-value using Fisher's exact test. Calculate a ratio of mean squared prediction errors post-treatment vs. pre-treatment. If the treatment effect is weak, the post- and pre-period would map onto one another, yielding a ratio close to 1. For the treated unit, this number must be large.
  4. Conduct sensitivity analyses.

Director's cut

In retail, one of the questions we continuously find ourselves answering is whether a specific intervention worked. The success is usually measured in one primary and multiple secondary KPIs. It can be a lift in units, sales dollars, sales margin, average basket size (or units per transaction a.k.a. UPT), average basket value (a.k.a. AOV), and/or new customer acquisition.

To make a causal inference, we need to know what would've happened in the absence of that intervention to compare with what happened due to that intervention. Comparing pre vs post implementation or year over year metrics does not measure the effect of the intervention. It measures the effect of the intervention and everything else that has changed in between.

The ideal way to measure such interventions is through designing randomized controlled trials. Through randomization we could at least make the assumption that everything else changed in the same way in our treatment and control stores, markets, or customers. However, in some cases, while implementing an intervention, we may hit some roadblocks that would limit our ability to randomize: 
  1. We may not have the budget to implement the intervention in a sufficiently large sample size. For example, if the plan is to test a new store layout, the initial "proof of concept" budget will likely allow changing the layouts of a limited number of stores.
  2. The intervention may be implemented in only one market (or a small subset of all markets). To give an example, when testing a new promotion, to create a consistent marketing message and customer experience, we focus on one (or a few) market(s) specifically. 
  3. The intervention will affect all markets, stores, customers, or product lines. A policy change, such as matching the prices of competitors, when announced, will be accessible to all customers. A subset of stores or customers can't be used as a control in this case.
In these cases, to measure the effect of the intervention, we lean on a quasi-experimental (or observational) methodologies. When the treated units are sparse and cannot be matched with an existing control group, synthetic control comes to rescue.

What is a synthetic control?


A synthetic control is essentially a prediction. The method predicts what the performance would've been if the treatment didn't happen. To predict such performance, it uses untreated subjects (donors) or time series that would be correlated with but not affected by the intervention. These donors, when combined, mimic the pretreatment characteristics of the treated group. If the characteristics are similar enough, the posttreatment performance of the donors could be used as a counterfactual. 

To achieve that goal, synthetic control method assigns a weight to each donor pretreatment. Even though it's an oversimplification of the methodology, you can think of synthetic control as a weighted average of your donors' KPIs. The same weight, posttreatment, is used to calculate the counterfactual performance. The treatment effect, then, is essentially calculated through comparing the actual outcome and the counterfactual (synthetic control) over the implementation period.

Modeling advice


A few key considerations while constructing your synthetic control and selecting donors: 
  1. If the intervention is at the market level, make sure the donors selected are not impacted by another market level intervention (such as a marketing test) or a disruption (such as a hurricane).
  2. If the intervention is at a product line level, make sure complementary or substitute SKUs are not selected as donors. The intervention could steal share from the substitutes or lift the sales of complements. Adding either one of them to the donor pool would result in biased estimates.


Implications for data centricity

Data centricity means being faithful to the data. In problems that require identification of the causal effect, the fundamental problem is the lack of data on the counterfactuals. The synthetic control method provides a way to generate data that is ideally close to what the real (but nonexistent) data would have been. In the synthetic control method, the counterfactual unit is selected as the weighted average of all potential comparison units that best resemble or match the characteristics of the treated unit(s) over an extended pre-treatment period. The credibility of a synthetic control depends on the degree of fit, but unfortunately, there are no ex ante guarantees of fit. In fact, if the fit is poor, Abadie et al. (2010) recommend against the use of synthetic controls.

In synthetic controls, data centricity is tested through sensitivity analyses. Abadie et al. (2010) provide a permutation distribution generated by iteratively reassigning the treatment to the units in the control pool and estimating "placebo effects" at each iteration. The effect of the treatment on the unit of interest is considered to be significant if the effect size is sufficiently extreme relative to the permutation distribution. In other words, data centricity is achieved by process of elimination, i.e., by ensuring that the randomly assigned treatment - synthetic control pairs do not produce a comparable effect, and thus the synthetic control is a good proxy for the counterfactual data.

Implementation



References

  • Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review 93(1): 113–32.
  • Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490): 493–505.
  • Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2014. “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science 59(2): 495–510.
  • Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
  • Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32.


Podcast-style discussion of the article

The raw/unedited podcast discussion produced by NotebookLM (proceed with caution):

Other popular articles

How to (and not to) log transform zero

What if parallel trends are not so parallel?