When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models
Ziyu Liu, Tao Li, Tianjie Ni, Xiaolong Lan, Wengang Ma, Tao Yang, Guohua Wang, Junjiang He

TL;DR
This paper introduces Paraesthesia, a novel emotion-style dynamic backdoor attack on large language models that leverages emotional cues as triggers, achieving high success rates while remaining stealthy.
Contribution
It proposes a new backdoor attack method using emotional style as a trigger, which is more covert and effective than static token-based triggers.
Findings
Achieves around 99% attack success rate across multiple models and tasks.
Maintains the utility of models on clean data.
Demonstrates emotional style as an effective backdoor trigger.
Abstract
Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In addition, Prior attacks rely on a single fixed trigger to induce harmful outputs. Such static triggers are easy to detect, and clean fine-tuning can weaken the trigger-target association. Through causal validation, we observe that emotion is not directly linked to individual words, but functions as an overall stylistic factor through tone. In the representation space of LLM, emotion can be decoupled from semantics, forming distinct cluster from the original neutral text. Therefore, we consider the emotional factor as the backdoor trigger to propose a pparasitic emotion-style dynamic backdoor attack, Paraesthesia. By mixing samples with the emotional trigger into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
