Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning

Ibne Farabi Shihab; Sanjeda Akter; Anuj Sharma

arXiv:2604.23987·cs.LG·April 28, 2026

Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

PDF

TL;DR

This paper reveals that in lifelong fine-tuning of large language models, coverage reliability can deteriorate faster than accuracy, and introduces a calibration replay method to maintain coverage effectively.

Contribution

It demonstrates coverage collapse occurs before accuracy drops and proposes a lightweight calibration replay technique to preserve coverage during continual learning.

Findings

01

Coverage loss exceeds accuracy loss by about 3.4 times on average.

02

Coverage can drop from 0.92 to 0.61 while accuracy remains stable.

03

Calibration replay restores coverage within two points of the nominal level.

Abstract

Continual learning for large language models is typically evaluated through accuracy retention under sequential fine-tuning. We argue that this perspective is incomplete, because uncertainty reliability can degrade earlier and more sharply than top-1 performance. We study this empirically by measuring conformal coverage and calibration error on sequentially fine-tuned models across three model families and eight task sequences drawn primarily from classification and multiple-choice benchmarks. Across the classification-style settings we study, coverage loss exceeds accuracy loss by a factor of roughly \(3.4\times \pm 0.5\times\) on average across seeds; in the most pronounced case, coverage drops from \(0.92\) to \(0.61\), while accuracy remains within three points of baseline. Standard continual-learning methods that preserve accuracy do not automatically preserve coverage, and naive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.