Posts

Causal inference is not about methods

Image
Image courtesy of Eleanor Murray - epiellie.com Solo post: Academic's take Causal modeling is becoming more popular after some notorious failures of overly optimistic reliance on black-box predictive models to solve business problems (e.g., Zillow's iBuying). This is great. We are also increasingly seeing the introduction of a new method that "solves" causal inference. This is not so good because it misdirects attention. Causal inference has more to do with data and assumptions than it does with methods. No method can "solve" the causal inference problem (although it can help by reducing bias). If anything, regression is one of the most common methods for causal inference, so regression is an effective causal inference method when all else is in order . This is different from predictive modeling, where brute force bias reduction using the most complex method may succeed. Price elasticity of demand problem Simply put, we want to know how demand will change ...

How a supposedly data-centric decision cost Walgreens $200 million and how to avoid it

Image
Image courtesy of  Rob Klas -  bloomberg.com Introduction to the business case The business case discussed in this post was reported by Bloomberg with the title "Walgreens Replaced Fridge Doors With Smart Screens. It’s Now a $200 Million Fiasco". You can find the article here . In summary, a startup promised Walgreens that its high-tech fridges would track shoppers and spark an in-store advertising revolution. Then the project failed miserably for a number of reasons. Most importantly, Walgreens faced a backlash when customers ended up seeing their reflections in dark mirrors instead of the drinks in the fridges through the glass doors. Store associates rushed to put signs on the coolers to explain which drinks were in which cooler. The project went so badly that it ended in a lawsuit between the startup and Walgreens, and not only did it fail to deliver business value, it resulted in losses in customer satisfaction, employee morale, and revenue. But why was this allowe...

Explaining the unexplainable Part II: SHAP and SAGE

Image
Image courtesy of iancovert.com Academic's take This post is an exception to close the loop we opened with  our post on LIME . Starting in 2025, we're changing the scope of Data Duets to focus on business cases (as opposed to methods). Having discovered an excellent write-up  that explains both SHAP (SHapley Additive exPlanations) and SAGE (Shapley Additive Global importancE), I will focus on the why questions, possible links to counterfactuals and causality, and the implications for data centricity. Why do we need SHAP and SAGE? The need for explainability methods stems clearly from the fact that most ensemble methods are black boxes in nature: they minimize prediction error but they obscure how the predictions are made. This is problematic because some predictions are only as good as the underlying conditions are favorable. For example, Zillow's iBuying was a notorious failure , likely due to a lack of clarity and transparency in how the predictions ...

Explaining the unexplainable Part I: LIME

Image
Image courtesy of finalyse.com Academic's take Every model is a simplified version of the reality, as it must be. That is fine as long as we know and understand how the reality is simplified and reduced to the parameters of a model. In predictive analytics, where nonparametric models are heavily used with a kitchen sink approach of adding any and all features to improve predictive performance, we don't know even know how a model simplifies the reality. So, what if we use another model to simplify and explain the nonparametric predictive model? This other model  is called a surrogate model and it is designed to be interpretable. In short, surrogate models explore the boundary conditions of decisions made by a predictive model. What is a surrogate model? Surrogate models can help us understand (i) the average prediction of a model (global surrogate) or (ii) a single prediction (local surrogate). The quest then becomes finding surrogate models that can explain the predictions ...

How to (and not to) log transform zero

Image
Image courtesy of the authors: Survey results of the papers with log zero in the American Economic Review Academic's take Log transformation is widely used in linear models for several reasons: Making data "behave" or conform to parametric assumptions, calculating elasticity, etc. The figure above shows that nearly 40% of the empirical papers in a selected journal used a log specification and 36% had the problem of the log of zero. When an outcome variable naturally has zeros, however, log transformation is tricky. In most cases, the solution is to instinctively add a positive constant to each value in the outcome variable. One popular idea is to add 1 so that raw zeros remain as log-transformed zeros. Another is to add a very small constant, especially if the scale of the variable is small. Well, the bad news is that these are all arbitrary choices that bias the resulting estimates. If a model is correlational, a small bias due to the transformation may not be a big con...

Are fixed effects really fixed?

Image
Image courtesy of the authors: Bias vs. the standard deviation of temporal unobserved heterogeneity where the heterogeneity follows a random walk Academic's take An interesting recent paper titled "Fixed Effects and Causal Inference" by Millimet and Bellemare (2023) discusses the feasibility of assuming fixed effects are fixed over long periods in causal models. The paper highlights the rather obvious but usually overlooked fact that fixed effects may fail to control for unobserved heterogeneity over long time periods. This makes perfect sense, since any effects that are assumed to be fixed (firm characteristics, store attributes, consumer demographics, artistic talent) are more likely to be constant over shorter periods, but may as well vary over longer periods. The paper refers to a critical point made by Mundlak (1978): "It would be unrealistic to assume that the individuals do not change in a differential way as the model assumes [...] It is more realistic to as...

What if parallel trends are not so parallel?

Image
Image courtesy of github.com/asheshrambachan/honestdid Academic's take In difference-in-differences models, parallel trends are often treated as a make-or-break assumption, and the assumption is not even testable. Why not? One misconception is that comparing the treated and control units before treatment is a test of the assumption. In reality, the assumption is not limited to the pretreatment period but for the entire counterfactual. Comparing trends in the pretreatment period is only a plausibility test. That is, "parallel pretreatment trends" and "parallel counterfactual trends" are not the same. We can observe the former, but we need the latter. So, what we have is not what we want, as is usually the case. Even testing for parallel pretreatment trends alone is tricky because of potential power issues (evidence of absence is not the absence of evidence!) and other reasons, including a potential sensitivity to the choice of functional form / transformations ...