Revisiting the robustness of post-hoc interpretability methods
Jiawen Wei, Hugues Turb\'e, Gianmarco Mengaldo

TL;DR
This paper introduces a new approach and metrics for fine-grained robustness assessment of post-hoc interpretability methods in AI, revealing their robustness is linked to coarse-grained performance.
Contribution
It proposes a novel approach and two metrics for detailed robustness evaluation of interpretability methods, addressing limitations of existing coarse-grained assessments.
Findings
Robustness correlates with coarse-grained performance.
New metrics enable fine-grained robustness analysis.
Interpretability methods vary significantly at the sample level.
Abstract
Post-hoc interpretability methods play a critical role in explainable artificial intelligence (XAI), as they pinpoint portions of data that a trained deep learning model deemed important to make a decision. However, different post-hoc interpretability methods often provide different results, casting doubts on their accuracy. For this reason, several evaluation strategies have been proposed to understand the accuracy of post-hoc interpretability. Many of these evaluation strategies provide a coarse-grained assessment -- i.e., they evaluate how the performance of the model degrades on average by corrupting different data points across multiple samples. While these strategies are effective in selecting the post-hoc interpretability method that is most reliable on average, they fail to provide a sample-level, also referred to as fine-grained, assessment. In other words, they do not measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Anomaly Detection Techniques and Applications
