Generating Valid and Natural Adversarial Examples with Large Language   Models

Zimu Wang; Wei Wang; Qi Chen; Qiufeng Wang; Anh Nguyen

arXiv:2311.11861·cs.CL·November 21, 2023·1 cites

Generating Valid and Natural Adversarial Examples with Large Language Models

Zimu Wang, Wei Wang, Qi Chen, Qiufeng Wang, Anh Nguyen

PDF

Open Access

TL;DR

This paper introduces LLM-Attack, a novel method leveraging large language models to generate adversarial examples that are both valid and natural, maintaining semantics and grammaticality, thus improving over existing approaches.

Contribution

The paper proposes a two-stage approach using LLMs for generating adversarial examples that are more valid and natural than previous methods, with superior human and GPT-4 evaluation results.

Findings

01

LLM-Attack outperforms baseline models in validity and naturalness.

02

Generated adversarial examples preserve semantics and grammaticality.

03

Human and GPT-4 evaluations favor LLM-Attack over existing methods.

Abstract

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education

MethodsAttention Is All You Need · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer · Adam · Multi-Head Attention