Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma, Songze Li

TL;DR
This paper introduces Hydra, a framework for stable multi-concept backdoor injection in text-to-image diffusion models, addressing challenges of concept interference and ensuring reliable attack activation.
Contribution
Hydra employs semantic-aligned trigger search and multi-task regularization to enable robust, multi-concept backdoor injection in decentralized, cumulative model reuse scenarios.
Findings
Hydra maintains ~95% attack success rate across multiple concepts.
It preserves clean generation quality and image fidelity.
Hydra outperforms existing methods in multi-concept backdoor stability.
Abstract
Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted and redistributed by multiple independent parties, allowing multiple concept-specific trigger-target associations to accumulate in the same model. When these associations coexist, semantic conflicts can be amplified in the shared representation space, leading to cross-concept entanglement and degraded generation quality. Notably, instead of strengthening the attack, such accumulation can destabilize previously injected behaviors and reduce attack reliability. In this work, we systematically investigate backdoor attacks under this interference-prone setting and propose Hydra, a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
