Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks
Chenyu Zhang, Lanjun Wang, Anan Liu

TL;DR
This paper introduces a gradient-based method to craft adversarial prompts that exploit vulnerabilities in Stable Diffusion, revealing how targeted harmful images can be generated and exposing underlying model weaknesses.
Contribution
The study formulates the targeted adversarial attack problem on Stable Diffusion and proposes a novel prompt optimization framework to demonstrate model vulnerabilities.
Findings
Effective adversarial prompts can reliably generate targeted images.
The method uncovers specific mechanisms behind model vulnerabilities.
Experiments validate the success of targeted attacks on Stable Diffusion.
Abstract
Recent developments in text-to-image models, particularly Stable Diffusion, have marked significant achievements in various applications. With these advancements, there are growing safety concerns about the vulnerability of the model that malicious entities exploit to generate targeted harmful images. However, the existing methods in the vulnerability of the model mainly evaluate the alignment between the prompt and generated images, but fall short in revealing the vulnerability associated with targeted image generation. In this study, we formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts. Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images. Furthermore, after obtaining successful adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
