A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

Haomin Zhuang; Yihua Zhang; Sijia Liu

arXiv:2303.16378·cs.CV·April 4, 2023·1 cites

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

Haomin Zhuang, Yihua Zhang, Sijia Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel query-free adversarial attack method against Stable Diffusion, demonstrating that small perturbations in text prompts can significantly alter generated images without model queries.

Contribution

It proposes the first query-free attack approach exploiting text encoder vulnerabilities, using influential embedding dimensions to manipulate image outputs.

Findings

01

A five-character perturbation can cause significant image content shifts.

02

Targeted attacks can steer image content without affecting other aspects.

03

The method does not require end-to-end model queries.

Abstract

Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optml-group/qf-attack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection

MethodsContrastive Language-Image Pre-training · Diffusion