Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?
Junchi Lu, Xinke Li, Yuheng Liu, Qi Alfred Chen

TL;DR
This paper reveals that open-source diffusion models can secretly carry backdoors, which can be inherited by synthetic data, posing significant security risks even with minimal impact on data utility.
Contribution
It uncovers the backdoor propagation mechanism in diffusion models and proposes novel attack strategies to embed backdoors during fine-tuning.
Findings
Backdoors can be effectively inherited by synthetic data from diffusion models.
Fine-tuning with specific loss objectives enhances backdoor retention.
Backdoor attacks remain effective in data-scarce and standard augmentation scenarios.
Abstract
The increasing use of generative models such as diffusion models for synthetic data augmentation has greatly reduced the cost of data collection and labeling in downstream perception tasks. However, this new data source paradigm may introduce important security concerns. Publicly available generative models are often reused without verification, raising a fundamental question of their safety and trustworthiness. This work investigates backdoor propagation in such emerging generative data supply chain, namely, Data-Chain Backdoor (DCB). Specifically, we find that open-source diffusion models can become hidden carriers of backdoors. Their strong distribution-fitting ability causes them to memorize and reproduce backdoor triggers in generation, which are subsequently inherited by downstream models, resulting in severe security risks. This threat is particularly concerning under clean-label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques
