The (Un)reliability of saliency methods

Pieter-Jan Kindermans; Sara Hooker; Julius Adebayo; Maximilian Alber,; Kristof T. Sch\"utt; Sven D\"ahne; Dumitru Erhan; Been Kim

arXiv:1711.00867·stat.ML·November 6, 2017·163 cites

The (Un)reliability of saliency methods

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber,, Kristof T. Sch\"utt, Sven D\"ahne, Dumitru Erhan, Been Kim

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that many saliency methods for explaining neural networks are unreliable because they are sensitive to input transformations that do not affect the model's predictions, highlighting the need for input invariance.

Contribution

The paper introduces the concept of input invariance as a criterion for reliable saliency methods and shows that many existing methods fail this criterion, leading to misleading explanations.

Findings

01

Adding a constant shift to input can alter saliency attributions without affecting model predictions.

02

Saliency methods lacking input invariance produce misleading explanations.

03

Ensuring input invariance is crucial for trustworthy model explanations.

Abstract

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

albermax/innvestigate
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning