Generating Textual Adversaries with Minimal Perturbation

Xingyi Zhao; Lu Zhang; Depeng Xu; Shuhan Yuan

arXiv:2211.06571·cs.CL·November 15, 2022

Generating Textual Adversaries with Minimal Perturbation

Xingyi Zhao, Lu Zhang, Depeng Xu, Shuhan Yuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new method for generating textual adversarial examples that minimally perturb original texts, maintaining semantic integrity while achieving higher attack success rates.

Contribution

A novel attack strategy that finds adversarial texts with minimal perturbation, outperforming existing methods in success rate and semantic preservation.

Findings

01

Higher success rates than state-of-the-art methods

02

Lower perturbation rates in benchmark datasets

03

Better semantic preservation of original texts

Abstract

Many word-level adversarial attack approaches for textual data have been proposed in recent studies. However, due to the massive search space consisting of combinations of candidate words, the existing approaches face the problem of preserving the semantics of texts when crafting adversarial counterparts. In this paper, we develop a novel attack strategy to find adversarial texts with high similarity to the original texts while introducing minimal perturbation. The rationale is that we expect the adversarial texts with small perturbation can better preserve the semantic meaning of original texts. Experiments show that, compared with state-of-the-art attack approaches, our approach achieves higher success rates and lower perturbation rates in four benchmark datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xingyizhao/tampers
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection