Backdoors in Conditional Diffusion: Threats to Responsible Synthetic Data Pipelines
Raz Lapid, Almog Dubin

TL;DR
This paper reveals security vulnerabilities in ControlNet-guided diffusion models through a model-poisoning attack that embeds backdoors, and proposes a clean fine-tuning method to mitigate such risks, highlighting the importance of secure data practices.
Contribution
It introduces a novel backdoor attack on ControlNet diffusion models and proposes a practical fine-tuning defense to enhance security in synthetic data pipelines.
Findings
Poisoning 1% of data achieves 90-98% attack success
Increasing poisoning to 5% further improves backdoor effectiveness
Clean fine-tuning reduces attack success rates significantly
Abstract
Text-to-image diffusion models achieve high-fidelity image generation from natural language prompts. ControlNets extend these models by enabling conditioning on structural inputs (e.g., edge maps, depth, pose), providing fine-grained control over outputs. Yet their reliance on large, publicly scraped datasets and community fine-tuning makes them vulnerable to data poisoning. We introduce a model-poisoning attack that embeds a covert backdoor into a ControlNet, causing it to produce attacker-specified content when exposed to visual triggers, without textual prompts. Experiments show that poisoning only 1% of the fine-tuning corpus yields a 90-98% attack success rate, while 5% further strengthens the backdoor, all while preserving normal generation quality. To mitigate this risk, we propose clean fine-tuning (CFT): freezing the diffusion backbone and fine-tuning only the ControlNet on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSmart Grid Security and Resilience · Adversarial Robustness in Machine Learning · Network Security and Intrusion Detection
