Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis

Heydar Soudani; Evangelos Kanoulas; Faegheh Hasibi

arXiv:2505.07459·cs.IR·June 11, 2025

Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis

Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi

PDF

Open Access

TL;DR

This paper critically examines the limitations of current Uncertainty Estimation methods in Retrieval-Augmented Generation, introduces an axiomatic framework to evaluate them, and proposes a calibration approach to enhance their reliability.

Contribution

It presents an axiomatic framework for assessing UE methods in RAG and introduces a calibration function that improves uncertainty reliability.

Findings

01

Existing UE methods do not satisfy all axioms in RAG.

02

No current UE method fully meets the proposed axioms.

03

The calibration function improves correlation between uncertainty and correctness.

Abstract

Large Language Models (LLMs) are valued for their strong performance across various tasks, but they also produce inaccurate or misleading outputs. Uncertainty Estimation (UE) quantifies the model's confidence and helps users assess response reliability. However, existing UE methods have not been thoroughly examined in scenarios like Retrieval-Augmented Generation (RAG), where the input prompt includes non-parametric knowledge. This paper shows that current UE methods cannot reliably assess correctness in the RAG setting. We further propose an axiomatic framework to identify deficiencies in existing methods and guide the development of improved approaches. Our framework introduces five constraints that an effective UE method should meet after incorporating retrieved documents into the LLM's prompt. Experimental results reveal that no existing UE method fully satisfies all the axioms,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · WordPiece