Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?

Junchi Lu; Xinke Li; Yuheng Liu; Qi Alfred Chen

arXiv:2512.15769·cs.CR·February 10, 2026

Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?

Junchi Lu, Xinke Li, Yuheng Liu, Qi Alfred Chen

PDF

Open Access

TL;DR

This paper reveals that open-source diffusion models can secretly carry backdoors, which can be inherited by synthetic data, posing significant security risks even with minimal impact on data utility.

Contribution

It uncovers the backdoor propagation mechanism in diffusion models and proposes novel attack strategies to embed backdoors during fine-tuning.

Findings

01

Backdoors can be effectively inherited by synthetic data from diffusion models.

02

Fine-tuning with specific loss objectives enhances backdoor retention.

03

Backdoor attacks remain effective in data-scarce and standard augmentation scenarios.

Abstract

The increasing use of generative models such as diffusion models for synthetic data augmentation has greatly reduced the cost of data collection and labeling in downstream perception tasks. However, this new data source paradigm may introduce important security concerns. Publicly available generative models are often reused without verification, raising a fundamental question of their safety and trustworthiness. This work investigates backdoor propagation in such emerging generative data supply chain, namely, Data-Chain Backdoor (DCB). Specifically, we find that open-source diffusion models can become hidden carriers of backdoors. Their strong distribution-fitting ability causes them to memorize and reproduce backdoor triggers in generation, which are subsequently inherited by downstream models, resulting in severe security risks. This threat is particularly concerning under clean-label…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques