Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models

Haoran Dai; Jiawen Wang; Ruo Yang; Manali Sharma; Zhonghao Liao; Yuan Hong; and Binghui Wang

arXiv:2508.01605·cs.CR·August 5, 2025

Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models

Haoran Dai, Jiawen Wang, Ruo Yang, Manali Sharma, Zhonghao Liao, Yuan Hong, and Binghui Wang

PDF

Open Access

TL;DR

This paper introduces a practical, generalizable, and robust backdoor attack on text-to-image diffusion models that requires minimal poisoned data and remains effective against defenses, posing significant security challenges.

Contribution

The authors propose a novel backdoor attack framework that is practical, generalizable across models, and robust against defenses, using only a few poisoned samples to achieve high attack success.

Findings

01

Achieves over 90% attack success rate with only 10 poisoned samples.

02

Remains effective against existing backdoor defenses and adaptive strategies.

03

Maintains high-quality benign image generation despite the attack.

Abstract

Text-to-image diffusion models (T2I DMs) have achieved remarkable success in generating high-quality and diverse images from text prompts, yet recent studies have revealed their vulnerability to backdoor attacks. Existing attack methods suffer from critical limitations: 1) they rely on unnatural adversarial prompts that lack human readability and require massive poisoned data; 2) their effectiveness is typically restricted to specific models, lacking generalizability; and 3) they can be mitigated by recent backdoor defenses. To overcome these challenges, we propose a novel backdoor attack framework that achieves three key properties: 1) \emph{Practicality}: Our attack requires only a few stealthy backdoor samples to generate arbitrary attacker-chosen target images, as well as ensuring high-quality image generation in benign scenarios. 2) \emph{Generalizability:} The attack is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection