Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning
Lorenzo Jaime Yu Flores, Cesare Spinoso di-Piano, Jackie Chi Kit Cheung

TL;DR
This paper investigates how supervised fine-tuning affects the reliability of confidence scores in language models, revealing that fine-tuning can degrade their correlation with output quality and emphasizing the need for more robust metrics.
Contribution
It provides an analysis of the sensitivity of confidence scores to fine-tuning and highlights the importance of developing more stable uncertainty quantification methods.
Findings
Confidence scores' correlation with quality degrades after fine-tuning.
Changes in confidence scores can be due to factors other than output quality.
Failing to address this reduces confidence scores' usefulness in downstream tasks.
Abstract
Uncertainty quantification is a set of techniques that measure confidence in language models. They can be used, for example, to detect hallucinations or alert users to review uncertain predictions. To be useful, these confidence scores must be correlated with the quality of the output. However, recent work found that fine-tuning can affect the correlation between confidence scores and quality. Hence, we investigate the underlying behavior of confidence scores to understand its sensitivity to supervised fine-tuning (SFT). We find that post-SFT, the correlation of various confidence scores degrades, which can stem from changes in confidence scores due to factors other than the output quality, such as the output's similarity to the training distribution. We demonstrate via a case study how failing to address this miscorrelation reduces the usefulness of the confidence scores on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
