TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with   Instruction Tuning

Xuanli He; Jun Wang; Qiongkai Xu; Pasquale Minervini; Pontus; Stenetorp; Benjamin I. P. Rubinstein; Trevor Cohn

arXiv:2404.19597·cs.CL·March 18, 2025·2 cites

TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning

Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus, Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn

PDF

Open Access 1 Video

TL;DR

This paper investigates cross-lingual backdoor attacks on multilingual LLMs, revealing high transferability and robustness of such attacks across multiple languages and models, exposing significant security vulnerabilities.

Contribution

It introduces a simple yet effective method for cross-lingual backdoor attacks on multilingual LLMs and demonstrates its high success rate and transferability across various models and languages.

Findings

01

High attack success rate exceeding 90% in multiple languages

02

Backdoor attacks transfer effectively across languages and models

03

Attacks remain effective even after applying defenses

Abstract

The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined - such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. Despite the increasing support for multilingual capabilities in open-source and proprietary LLMs, the impact of backdoor attacks on these systems remains largely under-explored. Our research focuses on cross-lingual backdoor attacks against multilingual LLMs, particularly investigating how poisoning the instruction-tuning data for one or two languages can affect the outputs for languages whose instruction-tuning data were not poisoned. Despite its simplicity, our empirical analysis reveals that our method exhibits remarkable efficacy in models like mT5 and GPT-4o, with high attack success rates, surpassing 90% in more than 7 out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TUBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning· underline

Taxonomy

TopicsNatural Language Processing Techniques · Access Control and Trust · Network Security and Intrusion Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · SentencePiece · Inverse Square Root Schedule · Linear Layer · Gated Linear Unit · Adafactor · Adam · Layer Normalization