The many Shapley values for model explanation
Mukund Sundararajan, Amir Najmi

TL;DR
This paper examines various implementations of the Shapley value for model explanation, highlights their differences and issues, and proposes a new method called Baseline Shapley with a solid theoretical foundation.
Contribution
It analyzes the multiplicity of Shapley value implementations, identifies problems with existing methods, and introduces Baseline Shapley, a new approach with a proven uniqueness property.
Findings
Existing Shapley implementations can produce counterintuitive attributions.
Proposed Baseline Shapley (BShap) has a formal uniqueness guarantee.
BShap addresses issues in previous attribution methods.
Abstract
The Shapley value has become a popular method to attribute the prediction of a machine-learning model on an input to its base features. The use of the Shapley value is justified by citing [16] showing that it is the \emph{unique} method that satisfies certain good properties (\emph{axioms}). There are, however, a multiplicity of ways in which the Shapley value is operationalized in the attribution problem. These differ in how they reference the model, the training data, and the explanation context. These give very different results, rendering the uniqueness result meaningless. Furthermore, we find that previously proposed approaches can produce counterintuitive attributions in theory and in practice---for instance, they can assign non-zero attributions to features that are not even referenced by the model. In this paper, we use the axiomatic approach to study the differences between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Bayesian Modeling and Causal Inference
