Semantic-level Backdoor Attack against Text-to-Image Diffusion Models
Tianxin Chen, Wenbo Jiang, Hongqiao Chen, Zhirun Zheng, Cheng Huang

TL;DR
This paper introduces SemBD, a novel semantic-level backdoor attack on text-to-image diffusion models that uses continuous semantic triggers and projection matrix editing to achieve high success and robustness against defenses.
Contribution
SemBD is the first attack to implant backdoors at the semantic representation level in T2I diffusion models, enhancing stealthiness and attack success rate.
Findings
Achieves 100% attack success rate.
Remains robust against state-of-the-art defenses.
Utilizes semantic regularization for stealthiness.
Abstract
Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks. Existing attacks typically rely on fixed textual triggers and single-entity backdoor targets, making them highly susceptible to enumeration-based input defenses and attention-consistency detection. In this work, we propose Semantic-level Backdoor Attack (SemBD), which implants backdoors at the representation level by defining triggers as continuous semantic regions rather than discrete textual patterns. Concretely, SemBD injects semantic backdoors by distillation-based editing of the key and value projection matrices in cross-attention layers, enabling diverse prompts with identical semantic compositions to reliably activate the backdoor attack. To further enhance stealthiness, SemBD incorporates a semantic regularization to prevent unintended…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
