Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks

Jinyan Su; Jin Peng Zhou; Zhengxin Zhang; Preslav Nakov; Claire Cardie

arXiv:2412.16708·cs.IR·July 29, 2025

Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks

Jinyan Su, Jin Peng Zhou, Zhengxin Zhang, Preslav Nakov, Claire Cardie

PDF

Open Access 1 Repo

TL;DR

This paper empirically investigates the vulnerability of Retrieval-Augmented Generation (RAG) systems to adversarial poisoning attacks and explores methods to improve their robustness and safety in real-world applications.

Contribution

It provides a controlled empirical analysis of RAG under attack, introduces a taxonomy of context types, evaluates retriever vulnerabilities, and proposes skeptical prompting as a partial defense.

Findings

01

Skeptical prompting activates LLMs' reasoning for self-defense.

02

Retrievers vary in exposing models to adversarial contexts.

03

Robustness can be improved with targeted strategies.

Abstract

Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate LLM hallucinations and enhance their performance in knowledge-intensive domains. However, these systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into the retrieval corpus can mislead models into producing factually incorrect outputs. In this paper, we present a rigorously controlled empirical study of how RAG systems behave under such attacks and how their robustness can be improved. On the generation side, we introduce a structured taxonomy of context types-adversarial, untouched, and guiding-and systematically analyze their individual and combined effects on model outputs. On the retrieval side, we evaluate several retrievers to measure how easily they expose LLMs to adversarial contexts. Our findings also reveal that "skeptical prompting" can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinyansu1/eval_poisonrag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Geophysical Methods and Applications · Anomaly Detection Techniques and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Adam · Weight Decay · Multi-Head Attention · Layer Normalization · WordPiece · Dropout · Softmax