Gradient-Based Word Substitution for Obstinate Adversarial Examples   Generation in Language Models

Yimu Wang; Peng Shi; Hongyang Zhang

arXiv:2307.12507·cs.CL·August 21, 2023

Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models

Yimu Wang, Peng Shi, Hongyang Zhang

PDF

Open Access

TL;DR

This paper introduces GradObstinate, a gradient-based method for automatically generating obstinate adversarial examples in NLP, outperforming manual antonym-based approaches in success rate and transferability across models.

Contribution

We propose GradObstinate, a novel gradient-based word substitution technique that automatically creates obstinate adversarial examples without manual constraints.

Findings

01

GradObstinate achieves higher attack success rates than antonym-based methods.

02

Obstinate substitutions transfer effectively to black-box models like GPT-3 and ChatGPT.

03

The method is validated across multiple models and NLP benchmarks.

Abstract

In this paper, we study the problem of generating obstinate (over-stability) adversarial examples by word substitution in NLP, where input text is meaningfully changed but the model's prediction does not, even though it should. Previous word substitution approaches have predominantly focused on manually designed antonym-based strategies for generating obstinate adversarial examples, which hinders its application as these strategies can only find a subset of obstinate adversarial examples and require human efforts. To address this issue, in this paper, we introduce a novel word substitution method named GradObstinate, a gradient-based approach that automatically generates obstinate adversarial examples without any constraints on the search space or the need for manual design principles. To empirically evaluate the efficacy of GradObstinate, we conduct comprehensive experiments on five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Adam · Layer Normalization · LAMB · Attention Dropout · Linear Layer · Softmax · Cosine Annealing · Dense Connections · Weight Decay