How do LLMs Compute Verbal Confidence

Dharshan Kumaran; Arthur Conmy; Federico Barbero; Simon Osindero; Viorica Patraucean; Petar Veli\v{c}kovi\'c

arXiv:2603.17839·cs.CL·May 20, 2026

How do LLMs Compute Verbal Confidence

Dharshan Kumaran, Arthur Conmy, Federico Barbero, Simon Osindero, Viorica Patraucean, Petar Veli\v{c}kovi\'c

PDF

TL;DR

This paper investigates how large language models generate verbal confidence scores, revealing that confidence is cached during answer generation and reflects a complex self-evaluation process beyond simple token probabilities.

Contribution

It provides evidence that LLMs cache confidence representations during answer generation, which are richer than token log-probabilities, enhancing understanding of model metacognition.

Findings

01

Confidence is cached at answer-adjacent positions before verbalization.

02

Cached confidence representations explain variance beyond token log-probabilities.

03

Verbal confidence reflects automatic, sophisticated self-evaluation in LLMs.

Abstract

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed -- just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents -- token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B (across TriviaQA, BigMath, and MMLU), Qwen 2.5 7B, and the reasoning model Magistral Small 24B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.