Task-Aware Calibration: Provably Optimal Decoding in LLMs

Tim Tomov; Dominik Fuchsgruber; Rajeev Verma; Stephan G\"unnemann

arXiv:2605.10202·cs.LG·May 12, 2026

Task-Aware Calibration: Provably Optimal Decoding in LLMs

Tim Tomov, Dominik Fuchsgruber, Rajeev Verma, Stephan G\"unnemann

PDF

TL;DR

This paper introduces task calibration for LLMs, aligning their output distributions in task-specific latent spaces to achieve optimal decoding and improve generation quality.

Contribution

It proposes a novel task calibration paradigm and a decision-theoretic approach for optimal decoding in LLMs, along with a new calibration metric called TCE.

Findings

01

MBR decoding on task-calibrated distributions is optimal.

02

Task calibration improves generation quality across tasks.

03

TCE quantifies calibration-related excess loss.

Abstract

LLM decoding often relies on the model's predictive distribution to generate an output. Consequently, misalignment with respect to the true generating distribution leads to suboptimal decisions in practice. While a natural solution is to calibrate the model's output distribution, for LLMs, this is ill-posed at the combinatorially vast level of free-form language. We address this by building on the insight that in many tasks, these free-form outputs can be interpreted in a semantically meaningful latent structure, for example, discrete class labels, integers, or sets. We introduce task calibration as a paradigm to calibrate the model's predictive distribution in the task-induced latent space. We apply a decision-theoretic result to show that Minimum Bayes Risk (MBR) decoding on the task-calibrated latent distribution is the optimal decoding strategy on latent model beliefs. Empirically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.