Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier

TL;DR
This paper reveals that multiple backdoor triggers can coexist and persist in LLMs, increasing vulnerability, and proposes a targeted retraining method to effectively mitigate multi-trigger poisoning attacks.
Contribution
It introduces a framework for understanding multi-trigger poisoning in LLMs and proposes a layer-wise retraining defense to remove embedded triggers efficiently.
Findings
Multiple triggers can coexist without interference.
High similarity triggers remain active despite token substitutions.
Proposed retraining method effectively removes triggers with minimal updates.
Abstract
Recent studies have shown that Large Language Models (LLMs) are vulnerable to data poisoning attacks, where malicious training examples embed hidden behaviours triggered by specific input patterns. However, most existing works assume a phrase and focus on the attack's effectiveness, offering limited understanding of trigger mechanisms and how multiple triggers interact within the model. In this paper, we present a framework for studying poisoning in LLMs. We show that multiple distinct backdoor triggers can coexist within a single model without interfering with each other, enabling adversaries to embed several triggers concurrently. Using multiple triggers with high embedding similarity, we demonstrate that poisoned triggers can achieve robust activation even when tokens are substituted or separated by long token spans. Our findings expose a broader and more persistent vulnerability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Security and Resilience · Software System Performance and Reliability · Blockchain Technology Applications and Security
MethodsHigh-Order Consensuses · Focus
