The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

Jiaxin Zhang; Xiangyu Peng; Qinglin Chen; Qinyuan Ye; Caiming Xiong; Chien-Sheng Wu

arXiv:2604.16830·cs.LG·April 21, 2026

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

Jiaxin Zhang, Xiangyu Peng, Qinglin Chen, Qinyuan Ye, Caiming Xiong, Chien-Sheng Wu

PDF

1 Repo

TL;DR

This paper identifies a calibration issue in on-policy distillation of language models, proposes a new framework CaOPD to improve confidence calibration without sacrificing task performance, and demonstrates its effectiveness across models and domains.

Contribution

It introduces CaOPD, a calibration-aware distillation method that aligns confidence estimates with deployment conditions, addressing overconfidence in existing approaches.

Findings

01

CaOPD achieves Pareto-optimal calibration and maintains competitive capability.

02

CaOPD generalizes robustly under out-of-distribution and continual learning.

03

Teacher supervision under privileged context causes systematic optimism bias.

Abstract

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: teacher supervision is formed under privileged context available during training, whereas the deployed model must report confidence using only deployment-time information. We formalize this perspective theoretically, showing that teacher-conditioned success is generally not a valid target for deployment-time confidence and that helpful privileged context induces entropy collapse and a systematic optimism bias. To address this, we propose a calibration-aware OPD framework, CaOPD, that estimates empirical confidence from model rollouts, replaces self-reported confidence with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SalesforceAIResearch/CaOPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.