Don't sweat the small stuff, classify the rest: Sample Shielding to   protect text classifiers against adversarial attacks

Jonathan Rusert; Padmini Srinivasan

arXiv:2205.01714·cs.CL·May 5, 2022·1 cites

Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks

Jonathan Rusert, Padmini Srinivasan

PDF

Open Access 1 Repo

TL;DR

This paper introduces Sample Shielding, a simple, classifier-agnostic defense method that enhances the robustness of deep learning text classifiers against minimal-change adversarial attacks without sacrificing original accuracy.

Contribution

It proposes a novel, easy-to-implement sampling-based defense strategy that significantly reduces attack success rates across multiple classifiers and datasets.

Findings

01

Attack success rate drops to <=10% with shielding

02

Sample Shielding maintains high accuracy on original texts

03

Effective against state-of-the-art minimal-change attacks

Abstract

Deep learning (DL) is being used extensively for text classification. However, researchers have demonstrated the vulnerability of such classifiers to adversarial attacks. Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact. State-of-the-art (SOTA) attack algorithms follow the general principle of making minimal changes to the text so as to not jeopardize semantics. Taking advantage of this we propose a novel and intuitive defense strategy called Sample Shielding. It is attacker and classifier agnostic, does not require any reconfiguration of the classifier or external resources and is simple to implement. Essentially, we sample subsets of the input text, classify them and summarize these into a final decision. We shield three popular DL text classifiers with Sample Shielding, test their resilience against four SOTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jonrusert/sampleshielding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning