Model Robustness with Text Classification: Semantic-preserving   adversarial attacks

Rahul Singh; Tarun Joshi; Vijayan N. Nair; and Agus Sudjianto

arXiv:2008.05536·cs.CL·August 17, 2020

Model Robustness with Text Classification: Semantic-preserving adversarial attacks

Rahul Singh, Tarun Joshi, Vijayan N. Nair, and Agus Sudjianto

PDF

TL;DR

This paper introduces algorithms for generating semantic-preserving adversarial attacks on text classification models, enabling robustness assessment in both white-box and black-box scenarios, including transformer-based models.

Contribution

The paper presents novel algorithms for semantic-preserving adversarial attacks applicable to both white-box and black-box text classification models, including transformers.

Findings

01

White-box attacks cause significant decision flips.

02

Black-box attacks effectively reverse transformer decisions.

03

Attacks preserve semantics and syntax of original text.

Abstract

We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box setting, the attacks created are able to reverse decisions of transformer based architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.