Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models

Tianyi Zhou; Johanne Medina; Sanjay Chawla

arXiv:2508.08139·cs.CL·March 18, 2026

Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models

Tianyi Zhou, Johanne Medina, Sanjay Chawla

PDF

Open Access 1 Video

TL;DR

This paper explores how large language models can estimate the reliability of their responses by analyzing token-level uncertainties, aiming to detect confabulations and improve trustworthiness.

Contribution

It introduces a novel uncertainty-based method for assessing LLM response reliability, revealing limitations of direct uncertainty signals and enhancing detection of unreliable outputs.

Findings

01

Correct in-context info boosts answer accuracy and confidence.

02

Misleading context leads to confidently incorrect responses.

03

Uncertainty-guided probing improves reliability detection across models.

Abstract

Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation, which poses increasing risks in multi-turn or agentic applications where outputs may be reused as context. In this work, we investigate how in-context information influences model behavior and whether LLMs can identify their unreliable responses. We propose a reliability estimation that leverages token-level uncertainty to guide the aggregation of internal model representations. Specifically, we compute aleatoric and epistemic uncertainty from output logits to identify salient tokens and aggregate their hidden states into compact representations for response-level reliability prediction. Through controlled experiments on open QA benchmarks, we find that correct in-context information improves both answer accuracy and model confidence, while misleading context often induces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models· underline

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques