Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

Parjanya Prajakta Prashant; Jiongli Zhu; Aldan Creo; and Babak Salimi

arXiv:2605.20005·cs.LG·May 20, 2026

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

Parjanya Prajakta Prashant, Jiongli Zhu, Aldan Creo, and Babak Salimi

PDF

TL;DR

This paper introduces FINCH, a loss-adaptive learning-rate schedule that significantly reduces catastrophic forgetting during fine-tuning large language models without sacrificing task performance.

Contribution

The paper proposes a novel loss-adaptive learning-rate schedule, FINCH, which effectively controls forgetting during fine-tuning by adjusting learning rates based on batch loss.

Findings

01

FINCH reduces forgetting by 93% on average across benchmarks.

02

On Qwen3-4B, FINCH cuts TruthfulQA degradation by 5x.

03

FINCH better preserves model confidence calibration.

Abstract

Fine-tuning large language models on new data improves task performance but degrades capabilities learned during pretraining, a phenomenon known as catastrophic forgetting. Existing methods mitigate this by modifying the fine-tuning objective to suppress high-loss tokens or sequences, but these tokens are essential for learning new tasks, especially those with poor pretraining coverage. In such settings, hard tokens should still contribute to learning, so forgetting must be controlled without suppressing them. We identify a simple mechanism for doing so: per-step forgetting is bounded by the product of the learning rate and the square root of the current training loss. This suggests that high-loss batches are especially prone to inducing forgetting. Motivated by this observation, we introduce FINCH, a loss-adaptive learning-rate schedule that reduces the learning rate on high-loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.