Phrase-level Textual Adversarial Attack with Label Preservation
Yibin Lei, Yu Cao, Dianqi Li, Tianyi Zhou, Meng Fang, Mykola, Pechenizkiy

TL;DR
This paper introduces PLAT, a phrase-level adversarial attack method that uses syntactic parsing and a pre-trained model to generate effective, fluent, and label-preserving adversarial examples for NLP models.
Contribution
The paper presents a novel phrase-level attack approach that expands perturbation space and maintains label integrity using a label-preservation filter based on language model likelihoods.
Findings
PLAT outperforms baseline attacks in effectiveness.
PLAT maintains high textual fluency and grammaticality.
Human evaluation confirms label preservation and attack quality.
Abstract
Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase-Level Textual Adversarial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
