Generating Valid and Natural Adversarial Examples with Large Language Models
Zimu Wang, Wei Wang, Qi Chen, Qiufeng Wang, Anh Nguyen

TL;DR
This paper introduces LLM-Attack, a novel method leveraging large language models to generate adversarial examples that are both valid and natural, maintaining semantics and grammaticality, thus improving over existing approaches.
Contribution
The paper proposes a two-stage approach using LLMs for generating adversarial examples that are more valid and natural than previous methods, with superior human and GPT-4 evaluation results.
Findings
LLM-Attack outperforms baseline models in validity and naturalness.
Generated adversarial examples preserve semantics and grammaticality.
Human and GPT-4 evaluations favor LLM-Attack over existing methods.
Abstract
Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer · Adam · Multi-Head Attention
