Model Supply Chain Poisoning: Backdooring Pre-trained Models via   Embedding Indistinguishability

Hao Wang; Shangwei Guo; Jialing He; Hangcheng Liu; Tianwei Zhang; Tao; Xiang

arXiv:2401.15883·cs.CR·February 5, 2025·2 cites

Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability

Hao Wang, Shangwei Guo, Jialing He, Hangcheng Liu, Tianwei Zhang, Tao, Xiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TransTroj, a novel backdoor attack on pre-trained models that embeds malicious behaviors in a way that is indistinguishable in embedding space, making it highly effective and transferable across the supply chain.

Contribution

The paper formalizes embedding indistinguishability as an attack framework and proposes a two-stage optimization to embed robust backdoors in PTMs, outperforming existing methods.

Findings

01

Achieves nearly 100% attack success rate on multiple downstream tasks.

02

Demonstrates robustness of TransTroj under various system settings.

03

Highlights the security risks in the ML supply chain due to transferable backdoors.

Abstract

Pre-trained models (PTMs) are widely adopted across various downstream tasks in the machine learning supply chain. Adopting untrustworthy PTMs introduces significant security risks, where adversaries can poison the model supply chain by embedding hidden malicious behaviors (backdoors) into PTMs. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. This makes it challenging for the backdoors to persist and propagate through the supply chain. In this paper, we propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain. In particular, we first formalize this attack as an indistinguishability problem between poisoned and clean samples in the embedding space. We decompose embedding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haowang-cqu/transtroj
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning in Healthcare · Autopsy Techniques and Outcomes