Rescaling Confidence: What Scale Design Reveals About LLM Metacognition
Yuyang Dai

TL;DR
This paper investigates how the design of confidence scales affects the accuracy of LLMs' self-reported uncertainty, revealing that smaller, well-structured scales improve metacognitive sensitivity.
Contribution
It systematically examines the impact of confidence scale design on LLM metacognition, highlighting the importance of scale granularity and boundaries.
Findings
0-20 scale improves metacognitive efficiency
Boundary compression degrades performance
Round-number preferences persist despite irregular ranges
Abstract
Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0--100) is rarely examined. We show that this design choice is not neutral. Across six LLMs and three datasets, verbalized confidence is heavily discretized, with more than 78% of responses concentrating on just three round-number values. To investigate this phenomenon, we systematically manipulate confidence scales along three dimensions: granularity, boundary placement, and range regularity, and evaluate metacognitive sensitivity using meta-d'. We find that a 0--20 scale consistently improves metacognitive efficiency over the standard 0--100 format, while boundary compression degrades performance and round-number preferences persist even under irregular ranges. These results demonstrate that confidence scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics
