Are fixed effects really fixed?

Image courtesy of the authors: Bias vs. the standard deviation of temporal unobserved heterogeneity where the heterogeneity follows a random walk

Academic's take

An interesting recent paper titled "Fixed Effects and Causal Inference" by Millimet and Bellemare (2023) discusses the feasibility of assuming fixed effects are fixed over long periods in causal models. The paper highlights the rather obvious but usually overlooked fact that fixed effects may fail to control for unobserved heterogeneity over long time periods.

This makes perfect sense, since any effects that are assumed to be fixed (firm characteristics, store attributes, consumer demographics, artistic talent) are more likely to be constant over shorter periods, but may as well vary over longer periods. The paper refers to a critical point made by Mundlak (1978):

"It would be unrealistic to assume that the individuals do not change in a differential way as the model assumes [...] It is more realistic to assume that individuals do change differentially but at a pace that can be ignored for short time intervals."

This is essentially a bias-variance tradeoff: over longer time periods, the variance of a fixed effects estimator gets smaller but the bias of the estimator is expected to get larger. The paper runs a series of simulations to test the robustness of the fixed effects estimator and offers an alternative, rolling estimator, approach for causal identification in panel data models. The figure shown above compares the bias of the proposed rolling estimator approach with the fixed effects model.[1]

One important takeaway for causal identification is to think in more detail before assuming away unobserved heterogeneity using fixed effects as a panel gets longer. There are more detailed insights in the paper, which can be accessed here.

[1] In the figure, the data generation process behind the unobserved heterogeneity is simulated as a random walk. If the unobserved heterogeneity follows unit-specific time trends instead (as usually modeled as in Mundlak (1978), Autor (2003), and other recent papers), the increase in bias takes a concave shape. See the appendix of the paper for more details.


Director's cut

The demographics of a market can affect customer demand, response to advertising, or willingness to pay. In analyzing the impact of an intervention, such as a marketing campaign, models usually assume that the demographics of customers within a market remain fairly constant. This is typically a valid assumption, since the analysis period doesn't usually extend to years: especially after Covid, the disruptions have limited the historical data that can be used without any corrections.

With that said, can we really assume that the demographics of a market do not change over time? Today, rapid gentrification in urban areas is changing the demographics of entire neighborhoods faster than ever before. In a store-level analysis, for example, using fixed effects to control for customer demographics at the neighborhood level is challenging: any observed change in the outcome may be due to the intervention as well as to the changes in customer demographics.


Implications for data centricity

Data centricity is staying true to the data. The inclusion of fixed effects to control for unobserved heterogeneity in a model follows the assumption that such heterogeneity is time-invariant. In other words, from a data centricity point of view, the data on the subject characteristics (product, store, customer) are assumed to be fixed. However, if the data change in reality, the inclusion of fixed effects fails to control for unobserved confounders. In other words, data that are assumed to be fixed vary in reality. The greater the variation, the further from reality the insights from the data. The paper discussed here shows how and why the gap between assumptions and reality can widen. 

References

  • Autor, D. H. (2003). Outsourcing at will: The contribution of unjust dismissal doctrine to the growth of employment outsourcing. Journal of labor economics, 21(1), 1-42.
  • Millimet, D., & Bellemare, M. F. (2023). Fixed Effects and Causal Inference.
  • Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica: journal of the Econometric Society, 69-85.

Podcast-style discussion of the article

The raw/unedited podcast discussion produced by NotebookLM (proceed with caution):

Other popular articles

How to (and not to) log transform zero

What if parallel trends are not so parallel?

Synthetic control method in the wild