TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Yuzhou. Nie, Yanting. Wang, Jinyuan. Jia, Michael J. De Lucia,, Nathaniel D. Bastian, Wenbo. Guo, Dawn. Song

TL;DR
TrojFM introduces a resource-efficient backdoor attack for very large foundation models, enabling effective, stealthy, and resilient attacks with minimal fine-tuning and computational resources, especially using only one A100 GPU.
Contribution
The paper presents a novel backdoor injection method for large models that requires minimal parameter tuning and computational resources, outperforming existing attacks in efficiency and stealth.
Findings
Effective backdoor attacks on large GPT-style models.
High resilience against state-of-the-art defenses.
Significant resource savings compared to prior methods.
Abstract
One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Security and Verification in Computing · Adversarial Robustness in Machine Learning
