Rethink the Evaluation for Attack Strength of Backdoor Attacks in Natural Language Processing
Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi

TL;DR
This paper challenges existing evaluations of stealthy backdoor attacks in NLP, proposing a new metric (ASRD) to better measure attack strength and introducing Trigger Breaker, a simple yet effective defense method.
Contribution
It introduces ASRD as a more accurate metric for attack strength and presents Trigger Breaker, a novel defense approach against stealthy backdoor attacks in NLP.
Findings
ASRD provides a more accurate measure of attack strength.
Trigger Breaker outperforms existing defenses.
Stealthy backdoor attack capacity is overestimated by previous metrics.
Abstract
It has been shown that natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack, which utilizes a `backdoor trigger' paradigm to mislead the models. The most threatening backdoor attack is the stealthy backdoor, which defines the triggers as text style or syntactic. Although they have achieved an incredible high attack success rate (ASR), we find that the principal factor contributing to their ASR is not the `backdoor trigger' paradigm. Thus the capacity of these stealthy backdoor attacks is overestimated when categorized as backdoor attacks. Therefore, to evaluate the real attack power of backdoor attacks, we propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models. Besides, since the defenses against stealthy backdoor attacks are absent, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
