Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think
Haotian Xue, Yongxin Chen

TL;DR
This paper reveals that pixel space diffusion models are significantly more robust to adversarial attacks than latent diffusion models, and can be used as effective purifiers against adversarial perturbations.
Contribution
It demonstrates the robustness of pixel diffusion models against white-box attacks and introduces their use as off-the-shelf purifiers for adversarial defenses.
Findings
PDMs are resistant to gradient-based white-box attacks.
PDMs can effectively remove adversarial patterns from LDMs.
Most current protection methods are insufficient against adversarial attacks.
Abstract
Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsLatent Diffusion Model · Diffusion
