The Manifold Hypothesis for Gradient-Based Explanations
Sebastian Bordt, Uddeshya Upadhyay, Zeynep Akata, Ulrike von Luxburg

TL;DR
This paper proposes that gradient-based explanations are more perceptually meaningful when aligned with the data manifold's tangent space, supported by experiments across multiple datasets and methods.
Contribution
It introduces a framework using variational autoencoders to estimate data manifolds and demonstrates the importance of alignment for explanation quality.
Findings
Aligned attributions are more perceptually meaningful.
Popular explanation methods like Integrated Gradients are more aligned than raw gradients.
Adversarial training enhances gradient alignment with the data manifold.
Abstract
When do gradient-based explanation algorithms provide perceptually-aligned explanations? We propose a criterion: the feature attributions need to be aligned with the tangent space of the data manifold. To provide evidence for this hypothesis, we introduce a framework based on variational autoencoders that allows to estimate and generate image manifolds. Through experiments across a range of different datasets -- MNIST, EMNIST, CIFAR10, X-ray pneumonia and Diabetic Retinopathy detection -- we demonstrate that the more a feature attribution is aligned with the tangent space of the data, the more perceptually-aligned it tends to be. We then show that the attributions provided by popular post-hoc methods such as Integrated Gradients and SmoothGrad are more strongly aligned with the data manifold than the raw gradient. Adversarial training also improves the alignment of model gradients with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
MethodsALIGN
