A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion
Haomin Zhuang, Yihua Zhang, Sijia Liu

TL;DR
This paper introduces a novel query-free adversarial attack method against Stable Diffusion, demonstrating that small perturbations in text prompts can significantly alter generated images without model queries.
Contribution
It proposes the first query-free attack approach exploiting text encoder vulnerabilities, using influential embedding dimensions to manipulate image outputs.
Findings
A five-character perturbation can cause significant image content shifts.
Targeted attacks can steer image content without affecting other aspects.
The method does not require end-to-end model queries.
Abstract
Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection
MethodsContrastive Language-Image Pre-training · Diffusion
