A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks
Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong, Sun

TL;DR
This paper proposes a comprehensive framework and benchmark for evaluating textual backdoor learning, addressing previous evaluation gaps by considering real-world scenarios, stealthiness, and semantic preservation, supported by an open-source toolkit.
Contribution
It introduces a formalized evaluation framework, a new benchmark suite, and an open-source toolkit for rigorous assessment of textual backdoor attacks and defenses.
Findings
Benchmarking attack and defense models under new paradigms
Evaluation of stealthiness using grammar error and perplexity metrics
Introduction of a clustering-based defense baseline CUBE
Abstract
Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning
MethodsFLIP
