TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning
Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus, Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn

TL;DR
This paper investigates cross-lingual backdoor attacks on multilingual LLMs, revealing high transferability and robustness of such attacks across multiple languages and models, exposing significant security vulnerabilities.
Contribution
It introduces a simple yet effective method for cross-lingual backdoor attacks on multilingual LLMs and demonstrates its high success rate and transferability across various models and languages.
Findings
High attack success rate exceeding 90% in multiple languages
Backdoor attacks transfer effectively across languages and models
Attacks remain effective even after applying defenses
Abstract
The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined - such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. Despite the increasing support for multilingual capabilities in open-source and proprietary LLMs, the impact of backdoor attacks on these systems remains largely under-explored. Our research focuses on cross-lingual backdoor attacks against multilingual LLMs, particularly investigating how poisoning the instruction-tuning data for one or two languages can affect the outputs for languages whose instruction-tuning data were not poisoned. Despite its simplicity, our empirical analysis reveals that our method exhibits remarkable efficacy in models like mT5 and GPT-4o, with high attack success rates, surpassing 90% in more than 7 out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Access Control and Trust · Network Security and Intrusion Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · SentencePiece · Inverse Square Root Schedule · Linear Layer · Gated Linear Unit · Adafactor · Adam · Layer Normalization
