Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models
Haoran Dai, Jiawen Wang, Ruo Yang, Manali Sharma, Zhonghao Liao, Yuan Hong, and Binghui Wang

TL;DR
This paper introduces a practical, generalizable, and robust backdoor attack on text-to-image diffusion models that requires minimal poisoned data and remains effective against defenses, posing significant security challenges.
Contribution
The authors propose a novel backdoor attack framework that is practical, generalizable across models, and robust against defenses, using only a few poisoned samples to achieve high attack success.
Findings
Achieves over 90% attack success rate with only 10 poisoned samples.
Remains effective against existing backdoor defenses and adaptive strategies.
Maintains high-quality benign image generation despite the attack.
Abstract
Text-to-image diffusion models (T2I DMs) have achieved remarkable success in generating high-quality and diverse images from text prompts, yet recent studies have revealed their vulnerability to backdoor attacks. Existing attack methods suffer from critical limitations: 1) they rely on unnatural adversarial prompts that lack human readability and require massive poisoned data; 2) their effectiveness is typically restricted to specific models, lacking generalizability; and 3) they can be mitigated by recent backdoor defenses. To overcome these challenges, we propose a novel backdoor attack framework that achieves three key properties: 1) \emph{Practicality}: Our attack requires only a few stealthy backdoor samples to generate arbitrary attacker-chosen target images, as well as ensuring high-quality image generation in benign scenarios. 2) \emph{Generalizability:} The attack is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection
