Adversarial Decoding: Generating Readable Documents for Adversarial   Objectives

Collin Zhang; Tingwei Zhang; Vitaly Shmatikov

arXiv:2410.02163·cs.CL·March 7, 2025

Adversarial Decoding: Generating Readable Documents for Adversarial Objectives

Collin Zhang, Tingwei Zhang, Vitaly Shmatikov

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces adversarial decoding, a versatile text generation method that creates readable adversarial documents capable of evading filters and influencing retrieval-augmented generation systems, surpassing existing techniques.

Contribution

It presents a novel, generic decoding approach that handles complex adversarial objectives, including embedding similarity, and produces more effective, readable adversarial texts than prior methods.

Findings

01

Outperforms existing methods in producing readable adversarial documents

02

Effective against RAG poisoning, jailbreaking, and filter evasion

03

Generates documents that influence retrieval and subsequent generation processes

Abstract

We design, implement, and evaluate adversarial decoding, a new, generic text generation technique that produces readable documents for different adversarial objectives. Prior methods either produce easily detectable gibberish, or cannot handle objectives that include embedding similarity. In particular, they only work for direct attacks (such as jailbreaking) and cannot produce adversarial text for realistic indirect injection, e.g., documents that (1) are retrieved in RAG systems in response to broad classes of queries, and also (2) adversarially influence subsequent generation. We also show that fluency (low perplexity) is not sufficient to evade filtering. We measure the effectiveness of adversarial decoding for different objectives, including RAG poisoning, jailbreaking, and evasion of defensive filters, and demonstrate that it outperforms existing methods while producing readable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

collinzrj/adversarial_decoding
pytorchOfficial

Videos

Adversarial Decoding: Generating Readable Documents for Adversarial Objectives· underline

Taxonomy

TopicsDigital Media Forensic Detection

MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Attention Dropout · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Residual Connection · WordPiece