Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang

TL;DR
This paper investigates overadaptation in supervised fine-tuning of language models, revealing that ensembling pretrained and fine-tuned models not only preserves general knowledge but also enhances performance on the target domain, supported by theoretical analysis and experiments.
Contribution
It provides the first formal theoretical analysis of overadaptation in language models and demonstrates ensembling's effectiveness in balancing bias and variance during fine-tuning.
Findings
Ensembling pretrained and fine-tuned models improves performance.
Overadaptation phenomenon observed in language models.
Theoretical analysis supports empirical results.
Abstract
Supervised fine-tuning (SFT) on domain-specific data is the dominant approach for adapting foundation models to specialized tasks. However, it has been observed that SFT models tend to forget knowledge acquired during pretraining. In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue. In this work, we demonstrate that the same holds for language models, and, more strikingly, we observe an overadaptation phenomenon: the ensemble model not only retains general knowledge from the foundation model but also outperforms the fine-tuned model even on the fine-tuning domain itself. Despite the empirical success of ensembling, a theoretical understanding of its benefits remains underexplored. We develop a formal theoretical analysis of the overadaptation phenomenon. Ensembling mitigates this by balancing two primary sources of error:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Hearing Loss and Rehabilitation
