Probabilistic Calibration Is a Trainable Capability in Language Models

Davide Baldelli; Sruthi Kuriakose; Maryam Hashemzadeh; Amal Zouaq; Sarath Chandar

arXiv:2605.11845·cs.CL·May 13, 2026

Probabilistic Calibration Is a Trainable Capability in Language Models

Davide Baldelli, Sruthi Kuriakose, Maryam Hashemzadeh, Amal Zouaq, Sarath Chandar

PDF

1 Repo

TL;DR

This paper demonstrates that probabilistic calibration in language models can be improved through fine-tuning, using synthetic prompts and two methods, enhancing structured sampling fidelity across multiple models.

Contribution

It introduces two calibration fine-tuning methods, soft-target and hard-target, showing they improve probabilistic calibration in language models.

Findings

01

Both methods significantly improve structured-sampling fidelity.

02

Hard-target fine-tuning excels in numeric sampling tasks.

03

Soft-target fine-tuning performs better on broad stochastic generation tasks.

Abstract

Language models are increasingly used in settings where outputs must satisfy user-specified randomness constraints, yet their generation probabilities are often poorly calibrated to those targets. We study whether this capability can be improved directly through fine-tuning. Concretely, we fine-tune language models on synthetic prompts that require sampling from mathematical distributions, and compare two Calibration Fine-Tuning variants: a soft-target method that converts the desired output distribution into trie-derived next-token targets, and a hard-target method that trains on sampled completions from the same target distribution. Across 12 models spanning four families, both methods substantially improve structured-sampling fidelity on held-out distribution families and unseen parameter settings, showing that probabilistic calibration is a trainable capability. Under our selected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chandar-lab/calibration-finetuning
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.