TL;DR
This paper evaluates how different RAG architectures withstand knowledge base poisoning attacks, revealing architecture significantly influences adversarial robustness and identifying content-reasoning as a key vulnerability point.
Contribution
It compares four RAG architectures under adversarial poisoning, introduces a behavioral taxonomy, and provides insights into architecture-specific robustness and failure modes.
Findings
Attack success varies from 81.9% to 24.4% across architectures.
Content-reasoning stage is the main vulnerability point.
MADAM-RAG shows high contradiction detection but struggles with reliable resolution.
Abstract
Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively against vanilla retrieve-then-generate pipelines. Architectures designed to handle conflicting retrieved information - multi-agent debate, agentic retrieval, recursive language models - remain untested against adversarially optimized contradictions. We evaluate four RAG architectures (vanilla RAG, agentic RAG, MADAM-RAG, and Recursive Language Models) under controlled single-document (N=1) poisoning on 921 Natural Questions QA pairs, comparing a clean baseline, naive injection, and CorruptRAG-AK - an adversarial attack whose meta-epistemic framing targets credibility assessment. Architecture is a high-impact variable in adversarial robustness: under CorruptRAG-AK, attack success rates range from 81.9% (vanilla) to 24.4% (RLM) - a spread of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
