Saliency Methods for Explaining Adversarial Attacks
Jindong Gu, Volker Tresp

TL;DR
This paper investigates the effectiveness of saliency methods in explaining adversarial attacks on neural networks, proposing an enhancement to Guided Backpropagation that improves interpretability and state-of-the-art performance.
Contribution
It introduces a simple enhancement to Guided Backpropagation, significantly improving its ability to explain adversarial misclassifications in neural networks.
Findings
Enhanced GuidedBP outperforms previous saliency methods in explaining adversarial attacks.
Saliency maps with the proposed method contain more class-discriminative information.
The approach is computationally efficient and improves interpretability of adversarial examples.
Abstract
The classification decisions of neural networks can be misled by small imperceptible perturbations. This work aims to explain the misled classifications using saliency methods. The idea behind saliency methods is to explain the classification decisions of neural networks by creating so-called saliency maps. Unfortunately, a number of recent publications have shown that many of the proposed saliency methods do not provide insightful explanations. A prominent example is Guided Backpropagation (GuidedBP), which simply performs (partial) image recovery. However, our numerical analysis shows the saliency maps created by GuidedBP do indeed contain class-discriminative information. We propose a simple and efficient way to enhance the saliency maps. The proposed enhanced GuidedBP shows the state-of-the-art performance to explain adversary classifications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
