Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging

Tiancheng Hu; Benjamin Minixhofer; Nigel Collier

arXiv:2510.17426·cs.CL·November 3, 2025

Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging

Tiancheng Hu, Benjamin Minixhofer, Nigel Collier

PDF

Open Access

TL;DR

This paper introduces a simple post-hoc model merging technique that navigates the alignment-calibration trade-off, producing models that outperform their parents in accuracy and calibration, thus mitigating the alignment tax efficiently.

Contribution

It demonstrates that interpolating between pre- and post-alignment models reveals Pareto-optimal solutions, improving both accuracy and calibration beyond individual models.

Findings

01

Interpolating models recovers calibration lost during alignment.

02

Model merging reveals Pareto-optimal trade-offs.

03

Merged models outperform individual models in accuracy.

Abstract

The "alignment tax" of post-training is typically framed as a drop in task accuracy. We show it also involves a severe loss of calibration, making models overconfident, less reliable, and model outputs less diverse. We show that this trade-off can be navigated effectively via a simple post-hoc intervention: interpolating between a model's weights before and after alignment. Crucially, this is not a strict trade-off. We find that the process consistently reveals Pareto-optimal interpolations - models that improve accuracy beyond both parents while substantially recovering the calibration lost during alignment. Our work demonstrates that simple model merging provides a computationally efficient method for mitigating the full scope of the alignment tax, yielding models that are more capable and more reliable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Human Pose and Action Recognition