Watch Out for Your Guidance on Generation! Exploring Conditional Backdoor Attacks against Large Language Models
Jiaming He, Wenbo Jiang, Guanyu Hou, Wenshu Fan, Rui Zhang, Hongwei Li

TL;DR
This paper introduces BrieFool, a novel backdoor attack method on large language models that uses generation conditions for stealthy activation, outperforming existing methods in effectiveness and practicality.
Contribution
It proposes a new poisoning paradigm based on generation conditions and develops BrieFool, an efficient framework for stealthy backdoor attacks on LLMs, with higher success rates.
Findings
Achieves 94.3% success rate on GPT-3.5-turbo.
Effective in safety and ability domains.
Outperforms baseline attack methods.
Abstract
Mainstream backdoor attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of backdoor activation, we present a new poisoning paradigm against LLMs triggered by specifying generation conditions, which are commonly adopted strategies by users during model inference. The poisoned model performs normally for output under normal/other generation conditions, while becomes harmful for output under target generation conditions. To achieve this objective, we introduce BrieFool, an efficient attack framework. It leverages the characteristics of generation conditions by efficient instruction sampling and poisoning data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Sparse Evolutionary Training · Byte Pair Encoding · Dense Connections · Residual Connection · Softmax · Adam
