Full-Gradient Representation for Neural Network Visualization
Suraj Srinivas, Francois Fleuret

TL;DR
This paper introduces full-gradients, a comprehensive neural network interpretability method that decomposes responses into input and neuron sensitivities, providing more complete and precise explanations than previous saliency map techniques.
Contribution
The paper proposes full-gradients as a novel interpretability tool satisfying completeness and weak dependence, with an efficient approximation called FullGrad for convolutional networks.
Findings
FullGrad explains model behavior more accurately than existing methods.
Saliency maps from FullGrad are sharper and better localized.
FullGrad outperforms other methods in quantitative tests.
Abstract
We introduce a new tool for interpreting neural net responses, namely full-gradients, which decomposes the neural net response into input sensitivity and per-neuron sensitivity components. This is the first proposed representation which satisfies two key properties: completeness and weak dependence, which provably cannot be satisfied by any saliency map-based interpretability method. For convolutional nets, we also propose an approximate saliency map representation, called FullGrad, obtained by aggregating the full-gradient components. We experimentally evaluate the usefulness of FullGrad in explaining model behaviour with two quantitative tests: pixel perturbation and remove-and-retrain. Our experiments reveal that our method explains model behaviour correctly, and more comprehensively than other methods in the literature. Visual inspection also reveals that our saliency maps are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Cell Image Analysis Techniques
MethodsInterpretability
