The (Un)reliability of saliency methods
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber,, Kristof T. Sch\"utt, Sven D\"ahne, Dumitru Erhan, Been Kim

TL;DR
This paper demonstrates that many saliency methods for explaining neural networks are unreliable because they are sensitive to input transformations that do not affect the model's predictions, highlighting the need for input invariance.
Contribution
The paper introduces the concept of input invariance as a criterion for reliable saliency methods and shows that many existing methods fail this criterion, leading to misleading explanations.
Findings
Adding a constant shift to input can alter saliency attributions without affecting model predictions.
Saliency methods lacking input invariance produce misleading explanations.
Ensuring input invariance is crucial for trustworthy model explanations.
Abstract
Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
