TextGuard: Provable Defense against Backdoor Attacks on Text   Classification

Hengzhi Pei; Jinyuan Jia; Wenbo Guo; Bo Li; Dawn Song

arXiv:2311.11225·cs.LG·November 28, 2023·1 cites

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song

PDF

Open Access 1 Repo

TL;DR

TextGuard is a novel provable defense mechanism for text classification models that guarantees robustness against backdoor attacks by partitioning training data and ensemble learning, outperforming existing defenses in certification accuracy.

Contribution

It introduces the first provable defense against backdoor attacks on text classification, combining data partitioning and ensemble methods with theoretical security guarantees.

Findings

01

TextGuard achieves higher certification accuracy than existing defenses.

02

It effectively counters multiple backdoor attack strategies.

03

Theoretical analysis confirms robustness when trigger length is within a threshold.

Abstract

Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-secure/textguard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Web Application Security Vulnerabilities

MethodsBalanced Selection