Invisible Textual Backdoor Attacks based on Dual-Trigger
Yang Hou, Qiuling Yue, Lujia Chai, Guozhao Liao, Wenbao Han, Wei Ou

TL;DR
This paper introduces a dual-trigger backdoor attack on textual large language models using syntax and mood as triggers, significantly improving attack robustness and performance over single-trigger methods.
Contribution
The paper proposes a novel dual-trigger backdoor attack method using syntax and mood, enhancing attack robustness and flexibility compared to existing single-trigger approaches.
Findings
Achieves nearly 100% attack success rate
Outperforms previous abstract feature-based methods
Provides dataset construction techniques for better attack performance
Abstract
Backdoor attacks pose an important security threat to textual large language models. Exploring textual backdoor attacks not only helps reveal the potential security risks of models, but also promotes innovation and development of defense mechanisms. Currently, most textual backdoor attack methods are based on a single trigger. For example, inserting specific content into text as a trigger or changing the abstract text features to be a trigger. However, the adoption of this single-trigger mode makes the existing backdoor attacks subject to certain limitations: either they are easily identified by the existing defense strategies, or they have certain shortcomings in attack performance and in the construction of poisoned datasets. In order to solve these issues, a dual-trigger backdoor attack method is proposed in this paper. Specifically, we use two different attributes, syntax and mood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptographic Implementations and Security · Security and Verification in Computing · Advanced Malware Detection Techniques
