TL;DR
ScoreAdv introduces a diffusion model-based method for generating natural adversarial examples that are effective, high-quality, and versatile across models and defenses, surpassing existing approaches.
Contribution
It presents a novel diffusion model-guided adversarial generation technique with interpretable guidance and saliency injection, enabling unlimited natural adversarial examples for various models.
Findings
Achieves state-of-the-art attack success rates.
Generates high-quality natural adversarial images.
Effective against defenses and multiple model types.
Abstract
Despite the success of deep learning across various domains, it remains vulnerable to adversarial attacks. Although many existing adversarial attack methods achieve high success rates, they typically rely on -norm perturbation constraints, which do not align with human perceptual capabilities. Consequently, researchers have shifted their focus toward generating natural, unrestricted adversarial examples (UAEs). GAN-based approaches suffer from inherent limitations, such as poor image quality due to instability and mode collapse. Meanwhile, diffusion models have been employed for UAE generation, but they still rely on iterative PGD perturbation injection, without fully leveraging their central denoising capabilities. In this paper, we introduce a novel approach for generating UAEs based on diffusion models, named ScoreAdv. This method incorporates an interpretable adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN · Diffusion · Focus
