BaseCal: Unsupervised Confidence Calibration via Base Model Signals
Hexiang Tan, Wanli Yang, Junwei Zhang, Xin Chen, Rui Tang, Du Su, Jingang Wang, Yuanzhuo Wang, Fei Sun, Xueqi Cheng

TL;DR
This paper introduces BaseCal, an unsupervised method to calibrate the confidence of large language models by leveraging their base models, significantly improving calibration without additional training or labels.
Contribution
Proposes two novel unsupervised, plug-and-play methods, BaseCal-ReEval and BaseCal-Proj, to calibrate PoLLMs using base LLM signals, reducing overconfidence effectively.
Findings
BaseCal reduces Expected Calibration Error (ECE) by 42.90% on average.
BaseCal methods outperform existing unsupervised calibration baselines.
Experiments conducted across five datasets and three LLM families validate effectiveness.
Abstract
Reliable confidence is essential for trusting the outputs of LLMs, yet widely deployed post-trained LLMs (PoLLMs) typically compromise this trust with severe overconfidence. In contrast, we observe that their corresponding base LLMs often remain well-calibrated. This naturally motivates us to calibrate PoLLM confidence using the base LLM as a reference. This work proposes two ways to achieve this. A straightforward solution, BaseCal-ReEval, evaluates PoLLM's responses by feeding them into the base LLM to get average probabilities as confidence. While effective, this approach introduces additional inference overhead. To address this, we propose BaseCal-Proj, which trains a lightweight projection to map the final-layer hidden states of PoLLMs back to those of their base LLMs. These projected states are then processed by the base LLM's output layer to derive base-calibrated confidence for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
