Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability
Hao Wang, Shangwei Guo, Jialing He, Hangcheng Liu, Tianwei Zhang, Tao, Xiang

TL;DR
This paper introduces TransTroj, a novel backdoor attack on pre-trained models that embeds malicious behaviors in a way that is indistinguishable in embedding space, making it highly effective and transferable across the supply chain.
Contribution
The paper formalizes embedding indistinguishability as an attack framework and proposes a two-stage optimization to embed robust backdoors in PTMs, outperforming existing methods.
Findings
Achieves nearly 100% attack success rate on multiple downstream tasks.
Demonstrates robustness of TransTroj under various system settings.
Highlights the security risks in the ML supply chain due to transferable backdoors.
Abstract
Pre-trained models (PTMs) are widely adopted across various downstream tasks in the machine learning supply chain. Adopting untrustworthy PTMs introduces significant security risks, where adversaries can poison the model supply chain by embedding hidden malicious behaviors (backdoors) into PTMs. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. This makes it challenging for the backdoors to persist and propagate through the supply chain. In this paper, we propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain. In particular, we first formalize this attack as an indistinguishability problem between poisoned and clean samples in the embedding space. We decompose embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning in Healthcare · Autopsy Techniques and Outcomes
