Facet-Level Tracing of Evidence Uncertainty and Hallucination in RAG
Passant Elchafei, Monorama Swain, Shahed Masoudian, Markus Schedl

TL;DR
This paper introduces a facet-level diagnostic framework for RAG systems, revealing that hallucinations are mainly caused by evidence misalignment and override rather than retrieval errors.
Contribution
It proposes a structured, facet-level analysis method to diagnose evidence usage and hallucination causes in RAG, offering deeper insights than traditional answer-level evaluations.
Findings
Hallucinations are driven more by evidence integration issues than retrieval errors.
Facet-level analysis uncovers systematic evidence override and misalignment patterns.
Evaluation across multiple models and datasets highlights recurring failure modes.
Abstract
Retrieval-Augmented Generation (RAG) aims to reduce hallucination by grounding answers in retrieved evidence, yet hallucinated answers remain common even when relevant documents are available. Existing evaluations focus on answer-level or passage-level accuracy, offering limited insight into how evidence is used during generation. In this work, we introduce a facet-level diagnostics framework for QA that decomposes each input question into atomic reasoning facets. For each facet, we assess evidence sufficiency and grounding using a structured Facet x Chunk matrix that combines retrieval relevance with natural language inference-based faithfulness scores. To diagnose evidence usage, we analyze three controlled inference modes: Strict RAG, which enforces exclusive reliance on retrieved evidence; Soft RAG, which allows integration of retrieved evidence and parametric knowledge; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
