Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
Himanshu Beniwal, Sailesh Panda, Birudugadda Srivibhav, Mayank Singh

TL;DR
This paper investigates cross-lingual backdoor attacks in multilingual LLMs, showing how poisoning data in one language can transfer malicious triggers across languages via shared embeddings, exposing a significant security vulnerability.
Contribution
It introduces the concept of cross-lingual backdoor attacks (X-BAT) and demonstrates their effectiveness in multilingual models through toxicity classification case studies.
Findings
Backdoors in one language transfer to others via shared embeddings.
Rare and high-occurring tokens serve as effective triggers.
The vulnerability affects the model's architecture, enabling concealed backdoors.
Abstract
We explore \textbf{C}ross-lingual \textbf{B}ackdoor \textbf{AT}tacks (X-BAT) in multilingual Large Language Models (mLLMs), revealing how backdoors inserted in one language can automatically transfer to others through shared embedding spaces. Using toxicity classification as a case study, we demonstrate that attackers can compromise multilingual systems by poisoning data in a single language, with rare and high-occurring tokens serving as specific, effective triggers. Our findings expose a critical vulnerability that influences the model's architecture, resulting in a concealed backdoor effect during the information flow. Our code and data are publicly available https://github.com/himanshubeniwal/X-BAT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
