Text Adversarial Purification as Defense against Adversarial Attacks

Linyang Li; Demin Song; Xipeng Qiu

arXiv:2203.14207·cs.CL·May 4, 2023

Text Adversarial Purification as Defense against Adversarial Attacks

Linyang Li, Demin Song, Xipeng Qiu

PDF

Open Access

TL;DR

This paper introduces a novel adversarial purification technique for defending against textual adversarial attacks by masking and reconstructing texts using language models, effectively countering strong word-substitution attacks.

Contribution

It pioneers the application of adversarial purification in NLP by leveraging language models to defend against word-substitution adversarial attacks.

Findings

01

Successfully defends against Textfooler and BERT-Attack

02

Effective in removing adversarial perturbations from text

03

Improves robustness of textual models

Abstract

Adversarial purification is a successful defense mechanism against adversarial attacks without requiring knowledge of the form of the incoming attack. Generally, adversarial purification aims to remove the adversarial perturbations therefore can make correct predictions based on the recovered clean samples. Despite the success of adversarial purification in the computer vision field that incorporates generative models such as energy-based models and diffusion models, using purification as a defense strategy against textual adversarial attacks is rarely explored. In this work, we introduce a novel adversarial purification method that focuses on defending against textual adversarial attacks. With the help of language models, we can inject noise by masking input texts and reconstructing the masked texts based on the masked language models. In this way, we construct an adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling