Explanations can be manipulated and geometry is to blame
Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders,, Marcel Ackermann, Klaus-Robert M\"uller, Pan Kessel

TL;DR
This paper reveals that explanation methods for neural networks can be arbitrarily manipulated through subtle input perturbations, linking this vulnerability to geometric properties of the networks and proposing ways to improve explanation robustness.
Contribution
It demonstrates the manipulability of explanations via input perturbations, relates this to neural network geometry, and proposes methods to enhance explanation robustness.
Findings
Explanations can be arbitrarily manipulated with minimal input changes.
A theoretical link between explanation susceptibility and network geometry is established.
Proposed mechanisms improve the robustness of neural network explanations.
Abstract
Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Scientific Computing and Data Management · Artificial Intelligence in Healthcare and Education
