Phrase-level Textual Adversarial Attack with Label Preservation

Yibin Lei; Yu Cao; Dianqi Li; Tianyi Zhou; Meng Fang; Mykola; Pechenizkiy

arXiv:2205.10710·cs.CL·May 25, 2022

Phrase-level Textual Adversarial Attack with Label Preservation

Yibin Lei, Yu Cao, Dianqi Li, Tianyi Zhou, Meng Fang, Mykola, Pechenizkiy

PDF

Open Access 1 Repo

TL;DR

This paper introduces PLAT, a phrase-level adversarial attack method that uses syntactic parsing and a pre-trained model to generate effective, fluent, and label-preserving adversarial examples for NLP models.

Contribution

The paper presents a novel phrase-level attack approach that expands perturbation space and maintains label integrity using a label-preservation filter based on language model likelihoods.

Findings

01

PLAT outperforms baseline attacks in effectiveness.

02

PLAT maintains high textual fluency and grammaticality.

03

Human evaluation confirms label preservation and attack quality.

Abstract

Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase-Level Textual Adversarial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yibin-lei/plat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning