AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
Xuelong Dai, Kaisheng Liang, Bin Xiao

TL;DR
AdvDiff introduces a novel diffusion model-based approach for generating realistic unrestricted adversarial examples, effectively bypassing defenses and outperforming existing methods on large-scale datasets like ImageNet.
Contribution
The paper presents two new adversarial guidance techniques for diffusion models, enabling stable and high-quality adversarial example generation with interpretability.
Findings
AdvDiff outperforms state-of-the-art attack methods on MNIST and ImageNet.
Generated adversarial examples are realistic and effective in fooling classifiers.
The approach demonstrates stability and interpretability in adversarial sampling.
Abstract
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models, which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for GAN-based methods on large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable in generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
