CATTO: Balancing Preferences and Confidence in Language Models

Nisarg Parikh; Ananya Sai; Pannaga Shivaswamy; Kunjal Panchal; Andrew Lan

arXiv:2601.23096·cs.LG·February 3, 2026

CATTO: Balancing Preferences and Confidence in Language Models

Nisarg Parikh, Ananya Sai, Pannaga Shivaswamy, Kunjal Panchal, Andrew Lan

PDF

Open Access

TL;DR

This paper introduces CATTO, a calibration-aware training objective for language models that improves confidence calibration without sacrificing accuracy, enhancing the reliability of model predictions in various tasks.

Contribution

We propose CATTO, a novel training method that aligns model confidence with correctness, and introduce Confidence@k, a test-time scaling technique for better token selection.

Findings

01

CATTO reduces calibration error significantly in both in-distribution and out-of-distribution settings.

02

CATTO maintains or improves question-answering accuracy across multiple datasets.

03

Confidence@k enhances output token selection using calibrated probabilities.

Abstract

Large language models (LLMs) often make accurate next token predictions but their confidence in these predictions can be poorly calibrated: high-confidence predictions are frequently wrong, and low-confidence predictions may be correct. This miscalibration is exacerbated by preference-based alignment methods breaking the link between predictive probability and correctness. We introduce a Calibration Aware Token-level Training Objective (CATTO), a calibration-aware objective that aligns predicted confidence with empirical prediction correctness, which can be combined with the original preference optimization objectives. Empirically, CATTO reduces Expected Calibration Error (ECE) by 2.22%-7.61% in-distribution and 1.46%-10.44% out-of-distribution compared to direct preference optimization (DPO), and by 0.22%-1.24% in-distribution and 1.23%-5.07% out-of-distribution compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education