DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization
Pucheng Dang, Xing Hu, Dong Li, Rui Zhang, Qi Guo, Kaidi Xu

TL;DR
DiffZOO introduces a purely query-based black-box attack method using Zeroth Order Optimization to effectively test and expose safety vulnerabilities in text-to-image diffusion models without prior model knowledge.
Contribution
The paper presents DiffZOO, a novel black-box attack framework that leverages Zeroth Order Optimization and discrete prompt enhancement techniques for safer T2I models.
Findings
Achieves 8.5% higher attack success rate than previous methods.
Effective against multiple safety mechanisms and online servers.
Demonstrates practical utility for red teaming T2I models.
Abstract
Current text-to-image (T2I) synthesis diffusion models raise misuse concerns, particularly in creating prohibited or not-safe-for-work (NSFW) images. To address this, various safety mechanisms and red teaming attack methods are proposed to enhance or expose the T2I model's capability to generate unsuitable content. However, many red teaming attack methods assume knowledge of the text encoders, limiting their practical usage. In this work, we rethink the case of \textit{purely black-box} attacks without prior knowledge of the T2l model. To overcome the unavailability of gradients and the inability to optimize attacks within a discrete prompt space, we propose DiffZOO which applies Zeroth Order Optimization to procure gradient approximations and harnesses both C-PRV and D-PRV to enhance attack prompts within the discrete prompt domain. We evaluated our method across multiple safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Berberine and alkaloids research
MethodsDiffusion
