How do LLMs Compute Verbal Confidence
Dharshan Kumaran, Arthur Conmy, Federico Barbero, Simon Osindero, Viorica Patraucean, Petar Veli\v{c}kovi\'c

TL;DR
This paper investigates how large language models generate verbal confidence scores, revealing that confidence is cached during answer generation and reflects a complex self-evaluation process beyond simple token probabilities.
Contribution
It provides evidence that LLMs cache confidence representations during answer generation, which are richer than token log-probabilities, enhancing understanding of model metacognition.
Findings
Confidence is cached at answer-adjacent positions before verbalization.
Cached confidence representations explain variance beyond token log-probabilities.
Verbal confidence reflects automatic, sophisticated self-evaluation in LLMs.
Abstract
Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed -- just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents -- token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B (across TriviaQA, BigMath, and MMLU), Qwen 2.5 7B, and the reasoning model Magistral Small 24B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
