Towards better understanding of gradient-based attribution methods for Deep Neural Networks
Marco Ancona, Enea Ceolini, Cengiz \"Oztireli, Markus Gross

TL;DR
This paper provides a formal comparison and unified framework for four gradient-based attribution methods in DNNs, introduces a new evaluation metric, and empirically assesses their effectiveness across multiple datasets and architectures.
Contribution
It offers a formal analysis of attribution methods, a unified framework for comparison, and introduces a novel evaluation metric for attribution quality.
Findings
Formal conditions of equivalence and approximation between methods
A new unified framework for gradient-based attribution methods
Empirical evaluation using Sensitivity-n metric across datasets
Abstract
Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work, we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
