Diffusion Models for Imperceptible and Transferable Adversarial Attack

Jianqi Chen; Hao Chen; Keyan Chen; Yilan Zhang; Zhengxia Zou; Zhenwei; Shi

arXiv:2305.08192·cs.CV·December 1, 2023·6 cites

Diffusion Models for Imperceptible and Transferable Adversarial Attack

Jianqi Chen, Hao Chen, Keyan Chen, Yilan Zhang, Zhengxia Zou, Zhenwei, Shi

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces DiffAttack, a novel adversarial attack method using diffusion models to generate imperceptible, transferable perturbations in the latent space, outperforming existing techniques in various scenarios.

Contribution

It is the first to leverage diffusion models for adversarial attacks, enhancing imperceptibility and transferability by manipulating latent space and distracting model attention.

Findings

01

Outperforms existing attack methods in success rate

02

Generates human-insensitive, semantically meaningful perturbations

03

Effective against various models, datasets, and defenses

Abstract

Many existing adversarial attacks generate $L_{p}$ -norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_{p}$ -norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further "deceive" the diffusion model which can be viewed…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The crating of adversarial perturbations in the latent space of diffusion models is quite intuitive and the paper is easy to follow. - The experimental results in Table 1 showed that the method can generally achieve good performance wrt a set of baselines.

Weaknesses

- The method seems to have adopted the diffusion model to craft adversarial perturbations in a relatively straightforward way since crafting adversarial attacks in the latent space of generative models has existed before (the diffusion model is a new representative of generative models). Thus it hinders the technical novelty of this work. - In Table 2, the results show that the method performed worse in attacking NIP-r3 and RS, and it performed worse than PI-FGSM to attack Adv-Inc-v3. - In Table

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

1) Evaluation results look promising. 2) The writing is clear.

Weaknesses

1) transfer loss: It is confusing that eq.4 optimizes the objective of variance, which is not differentiable. According to the description, it optimizes the objective of evenly distributed cross-attention maps. It disturbs the attention recognition of the diffusion process, but it is unclear how the corruption can be transferred to other pure classifiers without diffusion. Briefly, the loss can help attack diffusion, but how can it help transferability? 2) structure loss: if the purpose is to m

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The structure of the paper is well organized and most of the descriptions are very clear. - The performance of the proposed method are powerful, which significantly boosts the transferability and with a good FID. - The design of the unrestricted attack is intersting, and the resulting adversarial examples looks natural.

Weaknesses

- The comparison of papers may not be so fair. This proposed method additionally relies on stabe diffsion to enhance the transferability, while other methods can only rely on the substitute models.

Code & Models

Repositories

windvchen/diffattack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion