Explanations can be manipulated and geometry is to blame

Ann-Kathrin Dombrowski; Maximilian Alber; Christopher J. Anders,; Marcel Ackermann; Klaus-Robert M\"uller; Pan Kessel

arXiv:1906.07983·stat.ML·September 26, 2019·145 cites

Explanations can be manipulated and geometry is to blame

Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders,, Marcel Ackermann, Klaus-Robert M\"uller, Pan Kessel

PDF

Open Access 2 Repos

TL;DR

This paper reveals that explanation methods for neural networks can be arbitrarily manipulated through subtle input perturbations, linking this vulnerability to geometric properties of the networks and proposing ways to improve explanation robustness.

Contribution

It demonstrates the manipulability of explanations via input perturbations, relates this to neural network geometry, and proposes methods to enhance explanation robustness.

Findings

01

Explanations can be arbitrarily manipulated with minimal input changes.

02

A theoretical link between explanation susceptibility and network geometry is established.

03

Proposed mechanisms improve the robustness of neural network explanations.

Abstract

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Scientific Computing and Data Management · Artificial Intelligence in Healthcare and Education