Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?

Jiayu Liu; Qing Zong; Weiqi Wang; Yangqiu Song

arXiv:2505.24778·cs.CL·April 14, 2026

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?

Jiayu Liu, Qing Zong, Weiqi Wang, Yangqiu Song

PDF

1 Repo

TL;DR

This paper investigates whether epistemic markers reliably reflect large language models' confidence, revealing their limitations especially in out-of-distribution scenarios, and emphasizes the need for better confidence estimation methods.

Contribution

It systematically evaluates the stability of epistemic markers across datasets and models, highlighting their inconsistency in out-of-distribution settings and proposing the need for improved confidence measures.

Findings

01

Markers generalize well within the same distribution.

02

Markers' confidence is inconsistent in out-of-distribution scenarios.

03

Current markers may not reliably reflect true model uncertainty.

Abstract

As large language models (LLMs) are increasingly used in high-stakes domains, accurately assessing their confidence is crucial. Humans typically express confidence through epistemic markers (e.g., "fairly confident") instead of numerical values. However, it remains unclear whether LLMs consistently use these markers to reflect their intrinsic confidence due to the difficulty of quantifying uncertainty associated with various markers. To address this gap, we first define marker confidence as the observed accuracy when a model employs an epistemic marker. We evaluate its stability across multiple question-answering datasets in both in-distribution and out-of-distribution settings for open-source and proprietary LLMs. Our results show that while markers generalize well within the same distribution, their confidence is inconsistent in out-of-distribution scenarios. These findings raise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HKUST-KnowComp/MarConf
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.