Sound Explanation for Trustworthy Machine Learning
Kai Jia, Pasapol Saowakon, Limor Appelbaum, Martin Rinard

TL;DR
This paper critically examines the limitations of attribution-based explanations in machine learning, formalizes the concept of sound explanations, and demonstrates their application in building trust in cancer prediction models.
Contribution
It provides a formal critique of attribution methods, introduces the concept of sound explanations, and applies this framework to improve trust in clinical machine learning models.
Findings
Attribution algorithms cannot satisfy all desirable interpretability properties simultaneously.
Sound explanations provide causally sufficient information for understanding predictions.
Applying feature selection as a sound explanation enhances clinician trust in cancer models.
Abstract
We take a formal approach to the explainability problem of machine learning systems. We argue against the practice of interpreting black-box models via attributing scores to input components due to inherently conflicting goals of attribution-based interpretation. We prove that no attribution algorithm satisfies specificity, additivity, completeness, and baseline invariance. We then formalize the concept, sound explanation, that has been informally adopted in prior work. A sound explanation entails providing sufficient information to causally explain the predictions made by a system. Finally, we present the application of feature selection as a sound explanation for cancer prediction models to cultivate trust among clinicians.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
MethodsFeature Selection
