Certifiably Robust RAG against Retrieval Corruption

Chong Xiang; Tong Wu; Zexuan Zhong; David Wagner; Danqi Chen; Prateek Mittal

arXiv:2405.15556·cs.LG·April 2, 2026·5 cites

Certifiably Robust RAG against Retrieval Corruption

Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal

PDF

TL;DR

RobustRAG is a novel framework that provides certifiable robustness against retrieval corruption attacks in retrieval-augmented generation, ensuring reliable responses even under malicious data injection.

Contribution

It introduces RobustRAG, the first defense with certifiable robustness for RAG, using an isolate-then-aggregate strategy and new aggregation algorithms.

Findings

01

Achieves certifiable robustness with formal lower bounds on response quality.

02

Effective across multiple datasets, tasks, and large language models.

03

Demonstrates resilience against adaptive attackers with bounded malicious passages.

Abstract

Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality -- even against an adaptive attacker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.