Joint Universal Adversarial Perturbations with Interpretations

Liang-bo Ning; Zeyu Dai; Wenqi Fan; Jingran Su; Chao Pan; Luning Wang,; Qing Li

arXiv:2408.01715·cs.CR·August 6, 2024

Joint Universal Adversarial Perturbations with Interpretations

Liang-bo Ning, Zeyu Dai, Wenqi Fan, Jingran Su, Chao Pan, Luning Wang,, Qing Li

PDF

Open Access

TL;DR

This paper introduces a novel framework for generating universal adversarial perturbations that simultaneously deceive deep neural networks and mislead their interpretability methods, highlighting a new security concern.

Contribution

It is the first to propose and empirically validate joint universal adversarial perturbations targeting both DNN predictions and their interpretation maps.

Findings

01

JUAP effectively fools DNN classifiers across datasets.

02

JUAP successfully misleads attribution maps, reducing interpretability.

03

First demonstration of joint attack on models and their explanations.

Abstract

Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning