TrojFM: Resource-efficient Backdoor Attacks against Very Large   Foundation Models

Yuzhou. Nie; Yanting. Wang; Jinyuan. Jia; Michael J. De Lucia,; Nathaniel D. Bastian; Wenbo. Guo; Dawn. Song

arXiv:2405.16783·cs.CR·May 28, 2024

TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

Yuzhou. Nie, Yanting. Wang, Jinyuan. Jia, Michael J. De Lucia,, Nathaniel D. Bastian, Wenbo. Guo, Dawn. Song

PDF

Open Access 1 Repo

TL;DR

TrojFM introduces a resource-efficient backdoor attack for very large foundation models, enabling effective, stealthy, and resilient attacks with minimal fine-tuning and computational resources, especially using only one A100 GPU.

Contribution

The paper presents a novel backdoor injection method for large models that requires minimal parameter tuning and computational resources, outperforming existing attacks in efficiency and stealth.

Findings

01

Effective backdoor attacks on large GPT-style models.

02

High resilience against state-of-the-art defenses.

03

Significant resource savings compared to prior methods.

Abstract

One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucsb-mlsec/troj_fm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Security and Verification in Computing · Adversarial Robustness in Machine Learning