Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Leilei Gan; Jiwei Li; Tianwei Zhang; Xiaoya Li; Yuxian Meng; Fei Wu,; Yi Yang; Shangwei Guo; Chun Fan

arXiv:2111.07970·cs.CL·April 28, 2022·6 cites

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu,, Yi Yang, Shangwei Guo, Chun Fan

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel triggerless backdoor attack method for NLP that creates poisoned, correctly labeled samples, making detection more difficult and enhancing attack effectiveness.

Contribution

It proposes the first triggerless, clean-labeled backdoor attack strategy for NLP using a genetic algorithm-based sentence generation model.

Findings

01

The attack is highly effective in NLP tasks.

02

It is difficult to defend against due to its triggerless, clean-label approach.

03

The method outperforms traditional trigger-based attacks.

Abstract

Backdoor attacks pose a new threat to NLP models. A standard strategy to construct poisoned data in backdoor attacks is to insert triggers (e.g., rare words) into selected sentences and alter the original label to a target label. This strategy comes with a severe flaw of being easily detected from both the trigger and the label perspectives: the trigger injected, which is usually a rare word, leads to an abnormal natural language expression, and thus can be easily detected by a defense model; the changed target label leads the example to be mistakenly labeled and thus can be easily detected by manual inspections. To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled. The core idea of the proposed strategy is to construct clean-labeled examples, whose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning