Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning

Lorenzo Jaime Yu Flores; Cesare Spinoso di-Piano; Jackie Chi Kit Cheung

arXiv:2604.08974·cs.CL·April 13, 2026

Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning

Lorenzo Jaime Yu Flores, Cesare Spinoso di-Piano, Jackie Chi Kit Cheung

PDF

TL;DR

This paper investigates how supervised fine-tuning affects the reliability of confidence scores in language models, revealing that fine-tuning can degrade their correlation with output quality and emphasizing the need for more robust metrics.

Contribution

It provides an analysis of the sensitivity of confidence scores to fine-tuning and highlights the importance of developing more stable uncertainty quantification methods.

Findings

01

Confidence scores' correlation with quality degrades after fine-tuning.

02

Changes in confidence scores can be due to factors other than output quality.

03

Failing to address this reduces confidence scores' usefulness in downstream tasks.

Abstract

Uncertainty quantification is a set of techniques that measure confidence in language models. They can be used, for example, to detect hallucinations or alert users to review uncertain predictions. To be useful, these confidence scores must be correlated with the quality of the output. However, recent work found that fine-tuning can affect the correlation between confidence scores and quality. Hence, we investigate the underlying behavior of confidence scores to understand its sensitivity to supervised fine-tuning (SFT). We find that post-SFT, the correlation of various confidence scores degrades, which can stem from changes in confidence scores due to factors other than the output quality, such as the output's similarity to the training distribution. We demonstrate via a case study how failing to address this miscorrelation reduces the usefulness of the confidence scores on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.