Efficient Trigger Word Insertion
Yueqi Zeng, Ziqiang Li, Pengfei Xia, Lei Liu, Bin Li

TL;DR
This paper introduces an efficient trigger word insertion method for text backdoor attacks that achieves high success rates with minimal poisoned samples, enhancing attack efficiency in NLP models.
Contribution
It proposes a novel trigger word insertion strategy focusing on trigger optimization and poisoned sample selection to reduce poisoning rate while maintaining high attack success.
Findings
Achieves over 90% attack success rate with only 10 poisoned samples.
Requires only 1.5% of training data in clean-label setting.
Significantly improves attack effectiveness across datasets and models.
Abstract
With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models. However, previous works hardly consider the effect of the poisoning rate. In this paper, our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks. To accomplish this, we propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection. Extensive experiments on different datasets and models demonstrate that our proposed method can significantly improve attack effectiveness in text classification tasks. Remarkably, our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Web Application Security Vulnerabilities · Adversarial Robustness in Machine Learning
