Diffusion Models for Imperceptible and Transferable Adversarial Attack
Jianqi Chen, Hao Chen, Keyan Chen, Yilan Zhang, Zhengxia Zou, Zhenwei, Shi

TL;DR
This paper introduces DiffAttack, a novel adversarial attack method using diffusion models to generate imperceptible, transferable perturbations in the latent space, outperforming existing techniques in various scenarios.
Contribution
It is the first to leverage diffusion models for adversarial attacks, enhancing imperceptibility and transferability by manipulating latent space and distracting model attention.
Findings
Outperforms existing attack methods in success rate
Generates human-insensitive, semantically meaningful perturbations
Effective against various models, datasets, and defenses
Abstract
Many existing adversarial attacks generate -norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without -norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further "deceive" the diffusion model which can be viewed…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The crating of adversarial perturbations in the latent space of diffusion models is quite intuitive and the paper is easy to follow. - The experimental results in Table 1 showed that the method can generally achieve good performance wrt a set of baselines.
- The method seems to have adopted the diffusion model to craft adversarial perturbations in a relatively straightforward way since crafting adversarial attacks in the latent space of generative models has existed before (the diffusion model is a new representative of generative models). Thus it hinders the technical novelty of this work. - In Table 2, the results show that the method performed worse in attacking NIP-r3 and RS, and it performed worse than PI-FGSM to attack Adv-Inc-v3. - In Table
1) Evaluation results look promising. 2) The writing is clear.
1) transfer loss: It is confusing that eq.4 optimizes the objective of variance, which is not differentiable. According to the description, it optimizes the objective of evenly distributed cross-attention maps. It disturbs the attention recognition of the diffusion process, but it is unclear how the corruption can be transferred to other pure classifiers without diffusion. Briefly, the loss can help attack diffusion, but how can it help transferability? 2) structure loss: if the purpose is to m
- The structure of the paper is well organized and most of the descriptions are very clear. - The performance of the proposed method are powerful, which significantly boosts the transferability and with a good FID. - The design of the unrestricted attack is intersting, and the resulting adversarial examples looks natural.
- The comparison of papers may not be so fair. This proposed method additionally relies on stabe diffsion to enhance the transferability, while other methods can only rely on the substitute models.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
