Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion
Li Zheng, Liangbin Xie, Jiantao Zhou, He YiMin

TL;DR
This paper introduces UDAP, a diffusion-based adversarial purification framework specifically designed for Stable Diffusion models, effectively removing adversarial noise and enhancing robustness against various attack strategies.
Contribution
We propose UDAP, a novel diffusion-based purification method tailored for SD, utilizing DDIM metric loss and dynamic optimization to defend against diverse adversarial attacks.
Findings
UDAP effectively removes adversarial noise from SD outputs.
UDAP demonstrates robustness against multiple attack methods.
The dynamic epoch adjustment improves purification efficiency.
Abstract
Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversarial noise from contaminated data. However, existing purification methods are primarily designed for classification tasks and fail to address SD-specific adversarial strategies, such as attacks targeting the VAE encoder, UNet denoiser, or both. To address the gap in SD security, we propose Universal Diffusion Adversarial Purification (UDAP), a novel framework tailored for defending adversarial attacks targeting SD models. UDAP leverages the distinct reconstruction behaviors of clean and adversarial images during Denoising Diffusion Implicit Models (DDIM) inversion to optimize the purification process. By minimizing the DDIM metric loss, UDAP can effectively remove adversarial noise. Additionally, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
