The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

Hongru Song; Yu-an Liu; Ruqing Zhang; Jiafeng Guo; Jianming Lv; Maarten de Rijke; Xueqi Cheng

arXiv:2505.18583·cs.IR·May 29, 2025

The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

Hongru Song, Yu-an Liu, Ruqing Zhang, Jiafeng Guo, Jianming Lv, Maarten de Rijke, Xueqi Cheng

PDF

Open Access

TL;DR

This paper introduces ReGENT, a reinforcement learning framework for generating imperceptible adversarial examples that can mislead retrieval-augmented generation systems by subtly altering input queries.

Contribution

The paper presents a novel imperceptible retrieve-to-generate attack method and a reinforcement learning-based framework, ReGENT, to effectively deceive RAG systems.

Findings

01

ReGENT outperforms existing attack methods in misleading RAG systems.

02

ReGENT successfully generates human-imperceptible adversarial examples.

03

Experiments on factual and non-factual benchmarks validate the effectiveness of ReGENT.

Abstract

We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top- $k$ candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Layer Normalization · Byte Pair Encoding