Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
Bj\"orn Deiseroth, Max Henning H\"oth, Kristian Kersting, Letitia Parcalabescu

TL;DR
This paper introduces a novel training framework for retrieval-augmented generation systems that uses an interactive proof protocol to improve grounding, reduce hallucinations, and verify evidence in large language models.
Contribution
It adapts Merlin-Arthur protocols to train RAG systems, enabling models to verify evidence, reject unsupported answers, and improve factual grounding without manual unanswerable annotations.
Findings
Reduces hallucinations in RAG systems.
Improves evidence grounding and verification.
Enhances retrieval recall and MRR.
Abstract
Retrieval-augmented generation (RAG) relies on retrieved context to guide large language models (LLM), yet treats retrieval as a weak heuristic rather than verifiable evidence -- leading to unsupported answers, hallucinations, and reliance on spurious context. We introduce a novel training framework that treats the RAG pipeline as an interactive proof system by adapting the Merlin-Arthur (M/A) protocol: Arthur (the generator LLM) trains on questions with unknown context provenance and Merlin gives helpful evidence, while Morgana injects adversarial, misleading context. Both use an XAI method to identify and modify evidence most influential to Arthur. This trains Arthur to (1) answer when evidence supports the answer, (2) reject when evidence is insufficient, and (3) rely on the context spans that truly ground the answer. We further introduce a verification framework that disentangles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Scientific Computing and Data Management
