MaskPure: Improving Defense Against Text Adversaries with Stochastic   Purification

Harrison Gietz; Jugal Kalita

arXiv:2406.13066·cs.LG·June 21, 2024

MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification

Harrison Gietz, Jugal Kalita

PDF

Open Access 1 Repo

TL;DR

MaskPure introduces a stochastic text purification method inspired by diffusion models, significantly enhancing language model robustness against various adversarial attacks without needing attack-specific training or prior attack knowledge.

Contribution

It is the first stochastic purification technique demonstrating effectiveness against both character-level and word-level adversarial attacks in NLP.

Findings

01

Outperforms or matches existing defenses in robustness.

02

Requires no adversarial classifier training.

03

Proven to be certifiably robust.

Abstract

The improvement of language model robustness, including successful defense against adversarial attacks, remains an open problem. In computer vision settings, the stochastic noising and de-noising process provided by diffusion models has proven useful for purifying input images, thus improving model robustness against adversarial attacks. Similarly, some initial work has explored the use of random noising and de-noising to mitigate adversarial attacks in an NLP setting, but improving the quality and efficiency of these methods is necessary for them to remain competitive. We extend upon methods of input text purification that are inspired by diffusion processes, which randomly mask and refill portions of the input text before classification. Our novel method, MaskPure, exceeds or matches robustness compared to other contemporary defenses, while also requiring no adversarial classifier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hubarruby/maskpure
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques

MethodsDiffusion