SHAP scores fail pervasively even when Lipschitz succeeds
Olivier Letoffe, Xuanxiang Huang, Joao Marques-Silva

TL;DR
This paper demonstrates that SHAP scores, widely used in XAI, can be unreliable across various models, including Boolean, Lipschitz continuous, and differentiable regression models, highlighting fundamental limitations.
Contribution
It provides a comprehensive analysis showing that SHAP score issues are pervasive, even in models with desirable mathematical properties like Lipschitz continuity and differentiability.
Findings
SHAP scores are unsatisfactory for Boolean classifiers.
Issues with SHAP scores also occur in regression models.
Problems persist even in Lipschitz continuous and differentiable models.
Abstract
The ubiquitous use of Shapley values in eXplainable AI (XAI) has been triggered by the tool SHAP, and as a result are commonly referred to as SHAP scores. Recent work devised examples of machine learning (ML) classifiers for which the computed SHAP scores are thoroughly unsatisfactory, by allowing human decision-makers to be misled. Nevertheless, such examples could be perceived as somewhat artificial, since the selected classes must be interpreted as numeric. Furthermore, it was unclear how general were the issues identified with SHAP scores. This paper answers these criticisms. First, the paper shows that for Boolean classifiers there are arbitrarily many examples for which the SHAP scores must be deemed unsatisfactory. Second, the paper shows that the issues with SHAP scores are also observed in the case of regression models. In addition, the paper studies the class of regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases
MethodsShapley Additive Explanations
