Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach
Lars Nieradzik, Henrike Stephani, Janis Keuper

TL;DR
This paper introduces a robust, perturbation-based evaluation method for attribution maps in CNNs, addressing the limitations of existing metrics and demonstrating improved consistency and reliability across various datasets and architectures.
Contribution
It proposes replacing pixel modifications with adversarial perturbations for evaluation, offering a more reliable framework and comprehensive assessment of attribution maps.
Findings
Our metric passes all sanity checks.
SmoothGrad is identified as the best attribution map.
The method shows increased consistency across datasets and architectures.
Abstract
In this paper, we present an approach for evaluating attribution maps, which play a central role in interpreting the predictions of convolutional neural networks (CNNs). We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking. Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework. By using smoothness and monotonicity measures, we illustrate the effectiveness of our approach in correcting distribution shifts. In addition, we conduct the most comprehensive quantitative and qualitative assessment of attribution maps to date. Introducing baseline attribution maps as sanity checks, we find that our metric is the only contender to pass all checks. Using Kendall's rank correlation coefficient, we show the increased consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
