On Verbalized Confidence Scores for LLMs

Daniel Yang; Yao-Hung Hubert Tsai; Makoto Yamada

arXiv:2412.14737·cs.CL·May 6, 2026·5 cites

On Verbalized Confidence Scores for LLMs

Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada

PDF

1 Repo

TL;DR

This paper investigates the use of LLMs verbalizing their own confidence scores to improve uncertainty quantification, assessing their reliability across datasets, models, and prompts.

Contribution

It introduces a prompt-based method for LLMs to verbalize confidence scores and evaluates their calibration and reliability extensively.

Findings

01

Verbalized confidence scores' reliability varies with prompting methods.

02

Certain prompt strategies yield well-calibrated confidence scores.

03

Verbalized confidence scores can be an effective uncertainty measure.

Abstract

The rise of large language models (LLMs) and their tight integration into our daily life make it essential to dedicate efforts towards their trustworthiness. Uncertainty quantification for LLMs can establish more human trust into their responses, but also allows LLM agents to make more informed decisions based on each other's uncertainty. To estimate the uncertainty in a response, internal token logits, task-specific proxy models, or sampling of multiple responses are commonly used. This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens, which is a promising way for prompt- and model-agnostic uncertainty quantification with low overhead. Using an extensive benchmark, we assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods. Our results reveal that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielyxyang/llm-verbalized-uq
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.