SHAP values — Glossary Aria Research

Extended definition

SHAP (SHapley Additive exPlanations) is a machine learning model interpretability framework that attributes each feature’s contribution to an individual prediction based on Shapley values — a cooperative game theory concept formulated by Lloyd Shapley (1953, Nobel in Economics 2012) to fairly allocate the total gain of a coalition among its members. Applied to ML, each feature is a “member” of a “coalition” that produces the prediction; SHAP computes the average marginal contribution of the feature considering all possible inclusion orders. Lundberg and Lee (2017, NeurIPS) unified prior local interpretability methods (LIME, DeepLIFT, layer-wise relevance) under the SHAP framework, demonstrating that Shapley values have desirable properties (consistency, efficiency, symmetry, dummy) that other methods violate. Lundberg et al. (2020, Nature Machine Intelligence) presented TreeSHAP, an exact polynomial algorithm for trees and ensembles (Random Forest, gradient boosting) — previously computationally prohibitive. Applications generate standard visualizations: summary plots (global importance), dependence plots (non-linear effects), force plots (individual prediction decomposition).

When it applies

SHAP applies to interpretability of ML models in contexts where per-prediction explanation is a regulatory or ethical requirement: credit (automated decisions required by the US Equal Credit Opportunity Act and similar laws), health (ML-assisted clinical decisions), criminal justice (recidivism, bail), personnel selection (compliance with anti-discrimination laws). It is standard in applied ML research when the goal is not only to predict but to understand the model: identify dominant features, detect relevant interactions, validate coherence with domain knowledge. It applies especially in black-box models (neural networks, gradient boosting with many trees) where direct parameter inspection is infeasible.

When it does not apply

It does not apply as a substitute for intrinsically interpretable models when those are feasible and sufficient: linear or logistic regression with few features offers direct interpretation without approximation. It does not apply as causal evidence: SHAP attributes statistical contribution to the prediction, not causal relation between feature and outcome — frequent confusion. It does not apply under distribution shift: SHAP computed on training may not reflect production behavior if distribution changed. It does not replace fairness validation: a feature with high SHAP and demographic proxy can introduce discriminatory bias, requiring specific analysis. In datasets with highly correlated features, SHAP attribution between them can be unstable.

Applications by field

— Health: predictive risk explanation in electronic health records; research in digital phenotyping. — Finance: credit scoring models with regulatorily required explainability. — Scientific ML: biomarker feature identification in genomic and proteomic studies. — Social sciences: heterogeneous effect analysis in public policy studies.

Common pitfalls

The first pitfall is confusing SHAP with causality: a feature with high SHAP contribution is statistically important to the model, but real causal relationship with the outcome requires experimental design or explicit causal inference. The second is using KernelSHAP (general approximation) in models with exact TreeSHAP available: TreeSHAP is polynomial and more reliable. The third is interpreting SHAP of features correlated with others: contribution is distributed among correlated features in an order-sensitive way; isolated interpretation can mislead. The fourth is failing to validate coherence with domain knowledge: SHAP that drastically differs from clinical/business sense may indicate overfitting, data leakage, or representation problem. The fifth is treating global importance plots as robust insight without checking variability: aggregation over the sample can mask different behaviors in subgroups.