Self-Calibrating Language Models via Test-Time Discriminative Distillation

Mohamed Rissal Hedna; Jan Strich; Martin Semmann; Chris Biemann

arXiv:2604.09624·cs.CL·April 14, 2026

Self-Calibrating Language Models via Test-Time Discriminative Distillation

Mohamed Rissal Hedna, Jan Strich, Martin Semmann, Chris Biemann

PDF

1 Repo

TL;DR

SECL is a self-supervised, test-time training method that improves language model calibration by exploiting the model's own discriminative signals without requiring labeled data.

Contribution

It introduces a novel test-time training pipeline that enhances calibration of language models using label-free self-supervision, especially under distribution shifts.

Findings

01

SECL reduces Expected Calibration Error by 56-78%.

02

It outperforms existing inference-time calibration methods.

03

SECL requires only 6-26% of the question stream for training.

Abstract

Large language models (LLMs) are systematically overconfident: they routinely express high certainty on questions they often answer incorrectly. Existing calibration methods either require labeled validation data, degrade under distribution shifts, or incur substantial inference costs. Recent work has shown that LLMs already contain a better-calibrated signal than the one they verbalize: the token probability of "True" when the model is asked "Is this answer correct?" ( $P (True)$ ) consistently outperforms their stated confidence, a gap that is theoretically grounded as generative error is lower-bounded by roughly twice the corresponding discriminative error. We introduce $SECL$ ( $SE$ lf- $C$ alibrating $L$ anguage Models), a test-time training (TTT) pipeline that exploits this gap as label-free self-supervision, requiring no labeled data or human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/secl-emnlp26-submission-C890
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.