Explaining the unexplainable Part II: SHAP and SAGE

Image courtesy of iancovert.com

Academic's take

This post is an exception to close the loop we opened with our post on LIME: Explaining the unexplainable Part I: LIME. Starting in 2025, we're changing the scope of Data Duets to focus on business cases (as opposed to methods).

Having discovered an excellent write-up that explains both SHAP (SHapley Additive exPlanations) and SAGE (Shapley Additive Global importancE), I will focus on the why questions, possible links to counterfactuals and causality, and the implications for data centricity.

Why do we need SHAP and SAGE?

The need for explainability methods stems clearly from the fact that most ensemble methods are black boxes in nature: they minimize prediction error but they obscure how the predictions are made. This is problematic because some predictions are only as good as the underlying conditions are favorable. For example, Zillow's iBuying was a notorious failure, likely due to a lack of clarity and transparency in how the predictions were made. Could Zillow have done better if the data science team had paid more attention to explaining the model's predictions? We'll never know the answer, but more attention understanding the model would most likely have improved the outcome.

Overall, we need explainability methods because ensemble methods do not provide as much detail about how the predictions are made, and not all problems are causal problems that require the use of interpretable methods.

What are they?

Both SHAP and SAGE utilize an old, but gold standard concept in economics: Shapley value. Shapley value is a solution in cooperative game theory. It is named in honor of Lloyd Shapley, who introduced it in 1951 and was awarded the 2012 Nobel Memorial Prize in Economic Sciences.

In its simplest form, the Shapley value solves an allocation problem, and it has a hint of counterfactual reasoning. Imagine a team project where different team members contribute in different ways. Shapley values help determine how much credit each member deserves. The basic principle is that the Shapley value calculates the average contribution of each member by looking at how much value they add to the project in all possible scenarios. Of course, this is a simplified view.

In machine learning, we can think of any predictive model as a set of features that explain an outcome (like team members explaining project success). Using the analogy above, we are interested in allocating the contribution of each feature to the predicted outcome. This is a problem that is easier said than solved, especially as model complexity increases. Lundberg and Lee (2017) present the SHAP solution, and Covert et al. (2020) present the SAGE solution (and the "et al." here is Lundberg and Lee).

SHAP and SAGE serve different purposes:

SHAP explains the contribution of each feature in the model to the prediction for an observation (customer, store, product, region...). Thus, SHAP focuses on the local explanation.¹
SAGE explains the contribution of each feature in the model to the performance of the model. SAGE focuses on global explanation. The goal here is to make the feature importance model agnostic.

Limitations

The way the methods are applied has limitations (mainly due to the original Shapley solution). One of the most important is the linear modeling of each feature's contribution. Both methods assume that feature effects can be additively decomposed. This is likely to lead to suboptimal explanations in the presence of complex interdependencies and interactions between features, as well as in the presence of correlated features. These methods can also be computationally expensive because they rely on simulations.² For more on the limitations, especially the violation of the independence assumption (of features), see Aas et al. (2021) and other follow-up studies that explore the boundary conditions.

How do they work and relate to causality?

For the details of method dynamics, I refer you to Ian Covert's excellent explanation here. I will focus on why the descriptions of SHAP and SAGE resemble causal language. Remember that these methods are looking for an answer about the contribution of features to the outcome (either an individual prediction or the overall performance of the model). Thus, a natural test of the effect of the presence of a feature is to think about the absence of the feature. In other words, both methods use a "what if" question: What if the customer's age was not in the feature set (either individually or in the aggregate)? How would an individual prediction or model performance change?

This is similar to counterfactual reasoning, but it is not the counterfactual that establishes a causal relationship between the features and the outcome. The idea of replacing features is an input to the simulation that both methods use (KernelSHAP as SHAP's implementation). In other words, it is a way of quantifying the contribution of features by creating alternative models in which a subset of the features are replaced by random variables.

While such approach may misleadingly sound like causality, neither SHAP nor SAGE results can be taken as causal interpretations. What these methods do is similar to removing a feature from a regression model and adding it back to see the difference in the result. Without the accompanying assumptions of causal inference and the data needed to support a causal model, such an approach does not lead to causal claims about the feature on the outcome. Therefore, no, SHAP or SAGE scores do not have any causal interpretations.

[1] LIME answers the same question but in a different way (using a surrogate model instead of Shapley values). SHAP provides better consistency (the values add up to the actual prediction of the true model). Like we discussed in our post on LIME, they have serious limitations and neither explanation should be overstated. Other, ongoing, work shows that both can be misaligned with the data when a model has low predictive performance, the features are not mutually independent, or there are a large number of features. See Ragodos et al. (2024) for details and potential remedies (including normalization of Shapley values).^↩

[2] In tree-based models, simulations can be avoided and exact Shapley values can be computed in polynomial time by exploiting the tree structure (Lundberg et al., 2020).^↩

Director's cut

In our previous post on LIME, the academic and I motivated the need for explainability: modern machine learning models like gradient boosted trees and neural networks deliver superior prediction accuracy but they often function as black boxes. Some features go in, some predictions come out. No one knows why the model is predicting what it's predicting and whether the performance is driven by the features or the model. Model performance is evaluated and communicated through aggregated metrics (such as MAE, wMAPE, AUC, etc.). As long as the model is working and the metrics look good, business stakeholders don't typically ask how the model is meeting expectations.

The need to understand the inner workings of a model typically arises when:

The model starts to make errors (or the observation-level errors that were invisible in the aggregated metrics start to surface when decisions are broken down to the individual level). Or, the model does a great job and another data science team wants to learn from that success,
Some of the features introduced into the model make the decisions unfair and lead to algorithmic bias likely because the training data is inherently biased but without clarity on which features,
Stakeholders are seeking causal relationships usually without realizing it.

Let's think through some examples for each and understand how SHAP or SAGE can help (or not).

Deconstructing model results

Let's assume that the task at hand is to forecast the demand for individual products for an imaginary inventory management team. Inventory management teams typically use these forecasts to determine which items should be reordered and shipped to stores. In most cases, the data science team developing the forecast will track and focus on a combination of error metrics (e.g., RMSE, wMAPE, bias, etc.). These metrics are typically aggregated across a number of time periods, product categories, and geographic regions. Even if the metrics look promising at the aggregate level, the model may not perform well during certain periods (such as holidays), certain products (such as end-of-life or new products), or regions with unique seasonal characteristics. When this happens, and the reason behind the model's error is not trivial, the analysis typically starts by looking at the SHAP and SAGE values and understanding which features are driving the prediction. These values provide a sanity check.

While SHAP provides a local interpretation and drives the conversation about the errors for specific local instances (such as holidays), SAGE highlights the opportunities on a global level. For example, if the model is over-forecasting almost all products within a category, SAGE might help identify the feature that is driving the gap.

Understanding if the model is biased

Let's assume that a data science team is building a personalized offer model. After calculating each customer's propensity to respond positively to each offer, the model selects the best set of offers to send to each customer. How can the team tell if the engine is favoring a certain demographic group and is sending the offers to that group? In other words, how does the team know if the personalized offer model is taking biased actions?

The source of algorithmic bias may be the data itself. Some demographic groups may be underrepresented in the training data. For example, if offers were previously promoted through a social media platform, and the majority of customers who responded to the offer are younger, the algorithm will identify age as an important characteristic that influences whether customers respond positively to an offer. Even if the source data is fairly representative of all demographic groups, there may still be sampling errors that exclude certain demographic groups.

Removing demographic features from the model is one way to eliminate algorithmic bias. However, there may be additional features that are correlated with the removed demographic feature. For example, an apparel retailer may intentionally remove age from the model, but leave in the time of purchase or the amount purchased online in the model. If these features are correlated with age, they will continue to introduce bias.

SHAP could help identify bias at the individual prediction level by showing how each feature contributes to specific model outputs. Similarly, SAGE can help determine whether certain features have consistently positive or negative effects on certain populations. By examining whether certain attributes such as gender, age, or race disproportionately influence predictions and whether the model is overly dependent on demographic features, data science teams can determine if the model is biased. I encourage the reader to review SHAP's Python documentation for an in-depth analysis of model fairness.

Making a causal interpretation

Predictive models cannot be used to make causal claims. SHAP and SAGE reveal only correlational patterns that machine learning models identified in the data, not causal relationships. While traditional SHAP values are powerful for model interpretation, they should not be used for causal interpretation without applying causal frameworks and domain knowledge. In any case, data science teams should refrain from reporting SHAP or SAGE outputs in response to stakeholder demands for causal interpretations and switch to interpretable (vs. explainable) models.

Implications for data centricity

Data centricity means being true to the data. Can SHAP or SAGE help with data centricity? To some extent. In a completely black box ensemble method, it is not clear how the actual data is reflected in the model's decisions, i.e., predictions. In other words, as a user of such a model, we wouldn't have any insight into the relative contributions of the data to the outcome for each individual feature. With SHAP and SAGE, given the assumptions (the most important of which is independence), we can gain some insight into how different vectors in the data matrix relate to the outcome with respect to the predictions. This can potentially help reveal if an implicit assumption is violated. For example, we may have implicitly assumed (based on our conceptual model of the problem) that the most important feature and data for predicting creditworthiness is disposable income. SHAP and SAGE can show whether this is the case at the model output/prediction level, based on the methodology used, and at the individual level. This can help to take another step towards data centricity.

References

Aas, K., Jullum, M., & Løland, A. (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence, 298, 103502.
Covert, I., Lundberg, S. M., & Lee, S. I. (2020). Understanding global feature contributions with additive importance measures. Advances in Neural Information Processing Systems, 33, 17212-17223.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017) (pp. 4765-4774)
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67.
Ragodos, R., Wang, T., & Feng, L. (2024). From Model Explanation to Data Misinterpretation: Uncovering the Pitfalls of Post Hoc Explainers in Business Research. arXiv preprint arXiv:2408.16987.
Shapley, Lloyd S. (1951). "Notes on the n-Person Game -- II: The Value of an n-Person Game". Santa Monica, Calif.: RAND Corporation.

Podcast-style discussion of the article

The raw/unedited podcast discussion produced by NotebookLM (proceed with caution):

Search

Data Duets