A Unified Evaluation of Textual Backdoor Learning: Frameworks and   Benchmarks

Ganqu Cui; Lifan Yuan; Bingxiang He; Yangyi Chen; Zhiyuan Liu; Maosong; Sun

arXiv:2206.08514·cs.LG·November 2, 2022·24 cites

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong, Sun

PDF

Open Access 1 Repo

TL;DR

This paper proposes a comprehensive framework and benchmark for evaluating textual backdoor learning, addressing previous evaluation gaps by considering real-world scenarios, stealthiness, and semantic preservation, supported by an open-source toolkit.

Contribution

It introduces a formalized evaluation framework, a new benchmark suite, and an open-source toolkit for rigorous assessment of textual backdoor attacks and defenses.

Findings

01

Benchmarking attack and defense models under new paradigms

02

Evaluation of stealthiness using grammar error and perplexity metrics

03

Introduction of a clustering-based defense baseline CUBE

Abstract

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp/openbackdoor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning

MethodsFLIP