MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma, Ahmed Salem

TL;DR
MetaBackdoor reveals that positional encoding in Transformer-based LLMs can be exploited as a stealthy backdoor trigger, enabling attacks without altering input text and posing new security challenges.
Contribution
Introduces MetaBackdoor, a novel backdoor attack exploiting positional information as a trigger, expanding the threat model of LLM security beyond content-based methods.
Findings
Length-based positional triggers can activate backdoors stealthily.
Backdoored LLMs can leak sensitive internal information.
Positional triggers can be combined with content-based backdoors for enhanced stealth.
Abstract
Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input text. In this work, we show that this assumption is unnecessary and limiting. We introduce MetaBackdoor, a new class of backdoor attacks that exploits positional information as the trigger, without modifying textual content. Our key insight is that Transformer-based LLMs necessarily encode token positions to process ordered sequences. As a result, length-correlated positional structure is reflected in the model's internal computation and can be used as an effective non-content trigger signal. We demonstrate that even a simple length-based positional trigger is sufficient to activate stealthy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
