Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models
Yimu Wang, Peng Shi, Hongyang Zhang

TL;DR
This paper introduces GradObstinate, a gradient-based method for automatically generating obstinate adversarial examples in NLP, outperforming manual antonym-based approaches in success rate and transferability across models.
Contribution
We propose GradObstinate, a novel gradient-based word substitution technique that automatically creates obstinate adversarial examples without manual constraints.
Findings
GradObstinate achieves higher attack success rates than antonym-based methods.
Obstinate substitutions transfer effectively to black-box models like GPT-3 and ChatGPT.
The method is validated across multiple models and NLP benchmarks.
Abstract
In this paper, we study the problem of generating obstinate (over-stability) adversarial examples by word substitution in NLP, where input text is meaningfully changed but the model's prediction does not, even though it should. Previous word substitution approaches have predominantly focused on manually designed antonym-based strategies for generating obstinate adversarial examples, which hinders its application as these strategies can only find a subset of obstinate adversarial examples and require human efforts. To address this issue, in this paper, we introduce a novel word substitution method named GradObstinate, a gradient-based approach that automatically generates obstinate adversarial examples without any constraints on the search space or the need for manual design principles. To empirically evaluate the efficacy of GradObstinate, we conduct comprehensive experiments on five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Adam · Layer Normalization · LAMB · Attention Dropout · Linear Layer · Softmax · Cosine Annealing · Dense Connections · Weight Decay
