Interpretation of Neural Networks is Susceptible to Universal   Adversarial Perturbations

Haniyeh Ehsani Oskouie; Farzan Farnia

arXiv:2212.03095·cs.CV·April 23, 2024

Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations

Haniyeh Ehsani Oskouie, Farzan Farnia

PDF

Open Access

TL;DR

This paper demonstrates the existence of universal adversarial perturbations that can consistently manipulate gradient-based neural network interpretations across many images, revealing a vulnerability in current interpretability methods.

Contribution

It introduces the concept of a Universal Perturbation for Interpretation (UPI) and proposes gradient-based and PCA-based methods to generate such perturbations.

Findings

01

UPI can significantly alter neural network interpretations across test samples.

02

Proposed methods effectively generate UPIs that fool interpretation schemes.

03

Numerical results confirm the vulnerability of gradient-based explanations.

Abstract

Interpreting neural network classifiers using gradient-based saliency maps has been extensively studied in the deep learning literature. While the existing algorithms manage to achieve satisfactory performance in application to standard image recognition datasets, recent works demonstrate the vulnerability of widely-used gradient-based interpretation schemes to norm-bounded perturbations adversarially designed for every individual input sample. However, such adversarial perturbations are commonly designed using the knowledge of an input sample, and hence perform sub-optimally in application to an unknown or constantly changing data point. In this paper, we show the existence of a Universal Perturbation for Interpretation (UPI) for standard image datasets, which can alter a gradient-based feature map of neural networks over a significant fraction of test samples. To design such a UPI, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsTest