Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control
Xinyu Wang, Changzhi Sun, Yuanbin Wu, Xiaoling Wang

TL;DR
This paper introduces Anchored Learning, a distributional control framework for LLM fine-tuning that reduces catastrophic forgetting by stabilizing distributional updates, leading to improved performance and stability.
Contribution
It proposes a novel anchor-based method that interpolates between current and reference models, with theoretical guarantees and empirical validation on multiple benchmarks.
Findings
Significantly reduces performance degradation during fine-tuning.
Achieves near-optimal performance gains while maintaining stability.
Proven linear KL-divergence bound ensures stable distributional updates.
Abstract
Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of matching a fixed reference distribution, the anchor interpolates between the current model and a frozen reference to construct an intermediate target that the model distills toward, transforming global fine-tuning into a sequence of local trust-region updates in distribution space. Theoretically, we prove this anchor-based update admits a linear KL-divergence upper bound per iteration, ensuring a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
