RAP: Robustness-Aware Perturbations for Defending against Backdoor   Attacks on NLP Models

Wenkai Yang; Yankai Lin; Peng Li; Jie Zhou; Xu Sun

arXiv:2110.07831·cs.CL·October 18, 2021·1 cites

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces RAP, an online defense method that uses robustness-aware perturbations to effectively detect and defend against backdoor attacks in NLP models, outperforming existing methods in accuracy and efficiency.

Contribution

The paper proposes a novel robustness-aware perturbation technique for backdoor defense in NLP, with theoretical analysis and superior experimental results.

Findings

01

Achieves better defense performance than existing methods.

02

Maintains lower computational costs.

03

Effectively distinguishes poisoned samples from clean ones.

Abstract

Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lancopku/rap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling