ConfTuner: Training Large Language Models to Express Their Confidence Verbally

Yibo Li; Miao Xiong; Jiaying Wu; Bryan Hooi

arXiv:2508.18847·cs.CL·November 26, 2025

ConfTuner: Training Large Language Models to Express Their Confidence Verbally

Yibo Li, Miao Xiong, Jiaying Wu, Bryan Hooi

PDF

Open Access 3 Models 1 Video

TL;DR

ConfTuner is a novel fine-tuning method for large language models that improves their ability to express calibrated confidence verbally, enhancing trustworthiness and performance in high-stakes applications without requiring ground-truth confidence labels.

Contribution

It introduces a proper scoring rule-based loss function, the tokenized Brier score, for calibrating LLMs' verbal confidence without ground-truth scores, applicable to diverse models and tasks.

Findings

01

Improves calibration across multiple reasoning tasks

02

Generalizes to black-box models like GPT-4o

03

Enhances downstream self-correction and cascade performance

Abstract

Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare, where accurate expressions of uncertainty are essential for reliability and trust. However, current LLMs are often observed to generate incorrect answers with high confidence, a phenomenon known as "overconfidence". Recent efforts have focused on calibrating LLMs' verbalized confidence: i.e., their expressions of confidence in text form, such as "I am 80% confident that...". Existing approaches either rely on prompt engineering or fine-tuning with heuristically generated uncertainty estimates, both of which have limited effectiveness and generalizability. Motivated by the notion of proper scoring rules for calibration in classical machine learning models, we introduce ConfTuner, a simple and efficient fine-tuning method that introduces minimal overhead and does not require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

ConfTuner: Training Large Language Models to Express Their Confidence Verbally· slideslive

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)