Image courtesy of finalyse.com

Academic's take

Every model is a simplified version of the reality, as it must be. That is fine as long as we know and understand how the reality is simplified and reduced to the parameters of a model. In predictive analytics, where nonparametric models are heavily used with a kitchen sink approach of adding any and all features to improve predictive performance, we don't know even know how a model simplifies the reality. So, what if we use another model to simplify and explain the nonparametric predictive model? This other model is called a surrogate model and it is designed to be interpretable. In short, surrogate models explore the boundary conditions of decisions made by a predictive model.

What is a surrogate model?

Surrogate models can help us understand (i) the average prediction of a model (global surrogate) or (ii) a single prediction (local surrogate). The quest then becomes finding surrogate models that can explain the predictions (and do so faithfully, to be explained). Two increasingly popular surrogate algorithms that focusing on the single prediction problem (local explanations) are LIME and SHAP.

What is LIME?

LIME, or local interpretable model-agnostic explanations, was originally proposed by Ribeiro et al. (2016) to faithfully explain the predictions of any classifier or regressor by locally approximating it with an interpretable model.¹ In a sense, LIME assumes that any black-box model can be linearly defined on a local scale. Accordingly, LIME fits a simpler glass-box model around a single observation and simulates the behavior of the global model around the locality of the single observation.

How does it work?

LIME accomplishes this task almost like by applying bootstrapping around a prediction. Let's say we have a main black-box model that is trained on a dataset and has produced predictions. In the next section, Director uses a toy dataset where the outcome is the probability of a customer responding to a marketing campaign. Using this example, LIME's objective is to explain the contribution of each feature to the predicted probability for each customer. Using the language in this dataset, LIME works as follows:

For any observation to explain (for any customer's probability to respond), take the customer's data and generate (n) synthetic customers with some perturbations around the customer's feature values (dimensionality reduction to limit the number of features may be needed at this point)
As an intermediate step, calculate the distances between the original customer and the synthetic customers in the perturbed data, and convert the distance into a similarity score to be used as weights (synthetic customers that are closer to the actual customer are assigned larger weights)
Use the main model to predict the outcome in the perturbed data (predict the probability of cancellation for the synthetic customers)
Fit a simpler, explainer, model to the perturbed data (synthetic customers) that explains the outcome (the customer's probability to respond) using a subset of (m) most important features

Note that the weights in Step 2 are input to the model in Step 4. For example, if a LASSO model is used in Step 4, the weights from Step 2 make it a weighted LASSO. Because LASSO serves as a feature selection algorithm, the output of Step 4 will contain only a subset of (m) most important variables. A decision tree can easily replace LASSO and the procedure would be the same.

Formally, LIME would use the following pseudocode when the explainer is a weighted LASSO:

Input: x* - observation to be explained with all features
Input: N  - sample size for the explainer model 
Input: K  - the number of most important features
Input: Similarity - output of a distance function

Let f() is the black-box model and
x' = h(x*) be a version of x* after dimensionality reduction (only if needed)

for i in 1...N {
  z'[i] <- sample_around(x') 
  y'[i] <- f(z'[i])       # prediction for new observation z'[i]
  w'[i] <- similarity(x', z'[i]) 
}
return K-LASSO(y', x', w')

Like most data-driven approaches to model explanation that don't include a conceptual model to guide the process, LIME suffers from a lack of robustness. In addition to the expected sensitivity due to hyperparameter selection, LIME explanations become increasingly unstable as model complexity increases. Perhaps more importantly, the explanations are highly sensitive to the feature values of the original observation (Alvarez-Melis and Jaakkola, 2018). That is, LIME changes its explanations even when the feature values change only slightly (e.g., a customer who is 27.5 years old instead of 28).

LIME also seems to be (mis)used for global explanations by some aggregation of the individual explanations (value aggregation such as an average and/or rank aggregation such as by taking a majority vote). This creates additional problems, as the absolute value of a feature's contribution is likely to be less informative than its relative importance within a given observation. Moreover, the frequency with which a feature appears in different observations may be due to reasons other than the focus of the explanation (van der Linden et al., 2019). As Ribeiro et al. (2016) themselves point out:

"We note that local fidelity does not imply global fidelity: features that are globally important may not be important in the local context, and vice versa." ²

[1] In the context of interpretability, faithfulness means explaining a model locally around the prediction in such a way that the explanation reflects the observed behavior of a model. It is important to note that faithfulness is different from the plausibility of an explanation. “Plausibility” refers to how convincing the interpretation is to humans, while “faithfulness” refers to how accurately it reflects the true reasoning process of the model (Jacovi and Goldberg, 2020). Plausible but unfaithful interpretation can be dangerous, as we see signs of it in the so-called hallucinations of the large language models.^↩

[2] In this case, the fidelity or faithfulness of an explanation is how accurately the explanation reflects the true reasoning process of a model's prediction (Jacovi and Goldberg 2020).^↩

Director's cut

AI models are often black boxes — highly accurate, but not easily interpretable. This lack of interpretability is a significant hurdle when it comes to building trust in the model and explaining which features most strongly influence the outcome. Take, for example, predicting whether a customer will purchase. It's possible to use a boosted tree model to make highly accurate predictions for each customer. However, if the business objective is to increase the probability of purchase, understanding the individual contributions of each feature to the purchase decision becomes critical. The goal may be to understand how much each feature contributes, on average, to the purchase decisions of all customers. This requires a global understanding of the model. Customers may be prioritizing price, product reviews, a flexible return policy, and/or free shipping. Understanding the ranking of these features will inform the overall strategy. If the goal is to personalize the experience by understanding how much each feature contributes to each customer's purchase decision, we need a local understanding that is specific to each customer.

Until recently, feature importance was the only output available in black-box models. It is still popular and is the tool of choice for many modelers to understand the behavior of a model. This method is global in its interpretation, meaning that it provides an overall understanding of the model's behavior rather than focusing on individual predictions. However, feature importance has its limitations:

1) First, feature importance measures how much a feature reduces a model's error, not how sensitive the output is to that feature. In the case of customer purchase probability, if a customer's age is ranked as an important feature, it means that having age in the model considerably reduces the prediction error. It does not quantify whether and how much the purchase probabilities will change with age.

2) Feature importance is not reliable if some of the features are correlated. In the customer purchase example, age may be correlated with another characteristic such as education or income level. If all three of these highly correlated characteristics (age, income, and education) are in the model, the importance is split between them.

3) Above all, feature importance is not consistent across models. The feature importance rankings from a Random Forest model do not necessarily match the importance rankings from an XGBoost model.

In the figure below, I use two different models on the same data to predict whether a customer will respond positively to a marketing campaign. The features include the customer's age, education, income, marital status, recency, the number of children or teenagers the customer has, and the amount of purchases in different product categories (such as fruit, candy, meat products) over the past 2 years.

Random Forest ranks Recency, Amount of Gold (store brand) Products Purchased, Number of Teenagers at Home, Amount of Meat and Wine Products Purchased as the most important features. XGBoost selects Being Divorced, Being Single, Living Together, Having a 2nd Cycle Education, and Number of Teenagers at Home. Needless to say, different encodings of marital status are highly correlated. There is little overlap in the list of important features and their rankings between the two methods.

In summary, feature importance is a useful tool for evaluating the overall behavior of a particular model and what the model considers to be the most important variables, but the insights it provides about the impact of each feature on the outcome are limited.

If the goal is to understand why the black-box model produced a particular prediction for a particular instance, LIME provides local (in our specific example, customer-level) interpretations. It uses the original black-box model to make predictions for small variations in the data. For example, if our goal is to understand the most important features for a 34-year-old customer with an annual income of \$85,000, LIME first generates predictions for additional age groups (say 32, 33, 35, 36) and income levels (say \$83K, \$84K, \$86K, \$87K). Next, LIME fits an interpretable model to this newly generated data set. What it measures is the impact of each feature on the outcome, for each customer. The impact scores are local.

In the example below, I took one of the customers (Customer #49) and used LASSO to explain how each feature contributed to the customer's response to the campaign. Looking at the output of the Random Forest model, not having a teenager in the household is associated with the greatest decrease in the probability of Customer #49 responding to the campaign, followed by not being married. Conversely, the amount of Gold brand, and meat products purchased by Customer 49 is associated with an increase in the probability of responding to the campaign.

As the example shows, LIME's individual-level feature importance rankings are also model-dependent. What is identified as an important feature may change if the underlying model changes. However, the feature rankings are more consistent at the individual level than at the model level. This probably makes sense because the underlying method is a decision tree for both methods, and the tree is grown in roughly the same way for a particular individual (while it may vary greatly at a higher, model level for all customers).

To summarize:

1) Black box models achieve high predictive performance, but they do a poor job of explaining why a particular prediction is made. They are not meant to be used for interpretation.

2) Feature importance and LIME provide further insights into the behavior of the model, but they are also not universal. The underlying model could easily change the ranking of features.

Implications for data-informed decision making

Academic's take

The most straightforward implication is the ability to explain the results of a predictive black-box model at the individual level. If the predictions are made at a higher level, but the decisions require individual inputs, this is obviously very useful. What's more, the individual-level explanations don't have to be at the individual level. It can be stores or a group of stores in a region. If the decisions are different and have different implications for subsets in the data, the results of LIME can be useful.

From a data centricity perspective, black-box predictive methods have the largest gap between the actual data and the model insights generated using the data.³ Point predictions are not as helpful in understanding the fit between the data and insights because they are doubly blind: (i) the nature of the fit is unclear due to nonlinearity and (ii) averaging over all observations obscures individual data points.

Director's cut

Predictive analytics methods excel at one primary task: improving predictions. To achieve this goal, they typically start with interpretable models, blend them to create ensembles, and boost certain features to reduce error. Each of these actions distances the output from the data and makes the model uninterpretable.

Unfortunately, the distinction between model accuracy and model interpretability has been lost in the data science community. Building a model that is both accurate and interpretable is difficult. It requires a good understanding of the data generation process and the domain in which the data resides.

Using surrogates, which are interpretable models trained on an uninterpretable prediction, we can bridge the gap.

Implications for data centricity

Data centricity is staying true to data when modeling the data. The fundamental premise of predictive modeling is that the training set is a good representation of the test set. Only if the patterns observed historically continue to extrapolate into the future, a predictive model will continue to perform well. Otherwise, as may be expected, as the patterns observed historically begin to shift, or in other words, when the data drifts, understanding the impact of individual features on the prediction can provide some insight into the potential solution. However, more often than not, the predictive model that yields the lowest prediction error is a "black-box" model. Since there is no insight into the set of features that explain a particular prediction, it is not possible to evaluate the fit to the data at hand.

In this example, we have shown how LIME can be used to interpret the features at the local (observation) level, potentially increasing the chances of more data-centric decisions than a typical predictive model alone would allow.

[3] See datacentricity.org for a one-pager on how we define data centricity.^↩

References

Alvarez Melis, D., & Jaakkola, T. (2018). Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 31.

Jacovi, A., & Goldberg, Y. (2020). Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?. arXiv preprint arXiv:2004.03685.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386.

van der Linden, I., Haned, H., & Kanoulas, E. (2019). Global aggregations of local explanations for black box models. arXiv preprint arXiv:1907.03039.

Podcast-style discussion of the article

The raw/unedited podcast discussion produced by NotebookLM (proceed with caution):

Search

Data Duets

Explaining the unexplainable Part I: LIME

Academic's take

What is a surrogate model?

What is LIME?

How does it work?

Director's cut

Implications for data-informed decision making

Academic's take

Director's cut

Implications for data centricity

References

Podcast-style discussion of the article

Other popular articles

How to (and not to) log transform zero

What if parallel trends are not so parallel?

Explaining the unexplainable Part II: SHAP and SAGE