Semantic-Preserving Adversarial Text Attacks
Xinghao Yang, Weifeng Liu, James Bailey, Dacheng Tao, Wei Liu

TL;DR
This paper introduces BU-SPO, a novel adversarial attack method on text classifiers that uses bigram and unigram substitutions with semantic preservation techniques to efficiently induce misclassification while maintaining semantic integrity.
Contribution
It proposes a new hybrid attack method that leverages bigram and unigram substitutions along with semantic optimization to improve attack success and semantic preservation.
Findings
Achieves higher attack success rates than existing methods.
Maintains high semantic similarity in adversarial examples.
Uses fewer word modifications to induce misclassification.
Abstract
Deep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification is rarely studied. Several lines of text attack methods have been proposed in the literature, including character-level, word-level, and sentence-level attacks. However, it is still a challenge to minimize the number of word changes necessary to induce misclassification, while simultaneously ensuring lexical correctness, syntactic soundness, and semantic similarity. In this paper, we propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models. Our method has four major merits. Firstly, we propose to attack text documents not only at the unigram word level but also at the bigram level which better keeps semantics and avoids producing meaningless outputs. Secondly, we propose a hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
