Token-Modification Adversarial Attacks for Natural Language Processing:   A Survey

Tom Roth; Yansong Gao; Alsharif Abuadbba; Surya Nepal; Wei Liu

arXiv:2103.00676·cs.CL·January 9, 2024·6 cites

Token-Modification Adversarial Attacks for Natural Language Processing: A Survey

Tom Roth, Yansong Gao, Alsharif Abuadbba, Surya Nepal, Wei Liu

PDF

Open Access

TL;DR

This survey comprehensively reviews token-modification adversarial attacks in NLP, categorizing their components to aid understanding and guide future research in attack refinement.

Contribution

It introduces an attack-independent framework to systematically categorize and compare different components of token-modification adversarial attacks in NLP.

Findings

01

Systematic categorization of attack components

02

Framework enables easy comparison of attack methods

03

Guides future research in attack component refinement

Abstract

Many adversarial attacks target natural language processing systems, most of which succeed through modifying the individual tokens of a document. Despite the apparent uniqueness of each of these attacks, fundamentally they are simply a distinct configuration of four components: a goal function, allowable transformations, a search method, and constraints. In this survey, we systematically present the different components used throughout the literature, using an attack-independent framework which allows for easy comparison and categorisation of components. Our work aims to serve as a comprehensive guide for newcomers to the field and to spark targeted research into refining the individual attack components.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling