CATTO: Balancing Preferences and Confidence in Language Models
Nisarg Parikh, Ananya Sai, Pannaga Shivaswamy, Kunjal Panchal, Andrew Lan

TL;DR
This paper introduces CATTO, a calibration-aware training objective for language models that improves confidence calibration without sacrificing accuracy, enhancing the reliability of model predictions in various tasks.
Contribution
We propose CATTO, a novel training method that aligns model confidence with correctness, and introduce Confidence@k, a test-time scaling technique for better token selection.
Findings
CATTO reduces calibration error significantly in both in-distribution and out-of-distribution settings.
CATTO maintains or improves question-answering accuracy across multiple datasets.
Confidence@k enhances output token selection using calibrated probabilities.
Abstract
Large language models (LLMs) often make accurate next token predictions but their confidence in these predictions can be poorly calibrated: high-confidence predictions are frequently wrong, and low-confidence predictions may be correct. This miscalibration is exacerbated by preference-based alignment methods breaking the link between predictive probability and correctness. We introduce a Calibration Aware Token-level Training Objective (CATTO), a calibration-aware objective that aligns predicted confidence with empirical prediction correctness, which can be combined with the original preference optimization objectives. Empirically, CATTO reduces Expected Calibration Error (ECE) by 2.22%-7.61% in-distribution and 1.46%-10.44% out-of-distribution compared to direct preference optimization (DPO), and by 0.22%-1.24% in-distribution and 1.23%-5.07% out-of-distribution compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
