On the Robustness of Interpretability Methods
David Alvarez-Melis, Tommi S. Jaakkola

TL;DR
This paper emphasizes the importance of robustness in interpretability methods, introduces metrics to measure it, reveals current methods' shortcomings, and suggests ways to improve robustness in explanations.
Contribution
The paper introduces new metrics for robustness, evaluates existing interpretability methods, and proposes techniques to enhance their robustness.
Findings
Current interpretability methods lack robustness according to proposed metrics.
Robustness of explanations can be quantitatively assessed with new metrics.
Proposed methods can improve the robustness of existing interpretability approaches.
Abstract
We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Fault Detection and Control Systems
MethodsInterpretability
