TL;DR
This paper introduces SABER, a self-aware belief estimator for retrieval-augmented generation, enabling LLMs to recognize their knowledge limits and improve answer reliability without fine-tuning.
Contribution
The paper proposes SABER, a novel self-awareness mechanism for LLMs in RAG, with a new benchmark for knowledge conflict detection and a decision framework for trust and abstention.
Findings
SABER improves accuracy and faithfulness over baselines on conflict-heavy datasets.
It enables tunable balance between coverage and answer risk via abstention.
SABER outperforms prompt-based abstainers in risk-coverage trade-offs.
Abstract
Retrieval-augmented generation (RAG) improves large language models (LLMs) by incorporating external evidence, but it also introduces knowledge conflicts when retrieved contextual knowledge (CK) and parametric knowledge (PK) disagree or are both unreliable. Existing approaches mainly coordinate which source to use, without explicitly asking whether each answer path is correct. We argue that faithful RAG requires LLM self-awareness, namely the ability to recognize the limits of its own knowledge and reasoning. To ground this problem, we construct a model-specific, ground-truth-aligned knowledge-conflict benchmark by evaluating LLM backbones on PK-only and CK-conditioned answer paths over approximately 69K query-context instances per backbone, drawn from five conflict-QA datasets. We then introduce SABER, a Self-Aware Belief Estimator for RAG that requires no LLM fine-tuning. SABER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
