Does Alignment Tuning Really Break LLMs' Internal Confidence?

Hongseok Oh; Wonseok Hwang

arXiv:2409.00352·cs.CL·February 11, 2025

Does Alignment Tuning Really Break LLMs' Internal Confidence?

Hongseok Oh, Wonseok Hwang

PDF

Open Access 1 Repo

TL;DR

This paper investigates how alignment tuning affects the calibration of LLMs, revealing that it generally harms confidence accuracy and emphasizing the need for methods that preserve both alignment and calibration.

Contribution

It provides a comprehensive analysis of calibration degradation due to alignment tuning across multiple dimensions and highlights the importance of careful confidence measurement.

Findings

01

Alignment tuning often degrades LLM calibration.

02

Calibration and alignment are not always a trade-off, but tend to conflict under strict analysis.

03

Future algorithms should aim to improve both calibration and instruction-following.

Abstract

Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration. This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods. Initial analysis showed that the relationship between alignment and calibration is not always a trade-off, but under stricter analysis conditions, we found the alignment process consistently harms calibration. This highlights the need for (1) a careful approach when measuring model confidences and calibration errors and (2) future research into algorithms that can help LLMs to achieve both instruction-following and calibration without sacrificing either.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abzb1/alingment_calibration
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security