Certifiably Robust RAG against Retrieval Corruption
Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal

TL;DR
RobustRAG is a novel framework that provides certifiable robustness against retrieval corruption attacks in retrieval-augmented generation, ensuring reliable responses even under malicious data injection.
Contribution
It introduces RobustRAG, the first defense with certifiable robustness for RAG, using an isolate-then-aggregate strategy and new aggregation algorithms.
Findings
Achieves certifiable robustness with formal lower bounds on response quality.
Effective across multiple datasets, tasks, and large language models.
Demonstrates resilience against adaptive attackers with bounded malicious passages.
Abstract
Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality -- even against an adaptive attacker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
