Grow, Don't Overwrite: Fine-tuning Without Forgetting
Dyah Adila, Hanna Mazzawi, Benoit Dherin, Xavier Gonzalvo

TL;DR
This paper presents a novel method for fine-tuning pre-trained models that expands model capacity without overwriting existing knowledge, avoiding catastrophic forgetting and maintaining original capabilities.
Contribution
We introduce a function-preserving expansion technique that allows stable, efficient fine-tuning by replicating and scaling parameters within transformer models.
Findings
Eliminates the trade-off between plasticity and stability.
Matches full fine-tuning performance without degrading original capabilities.
Achieves similar results by expanding only a subset of layers.
Abstract
Adapting pre-trained models to specialized tasks often leads to catastrophic forgetting, where new knowledge overwrites foundational capabilities. Existing methods either compromise performance on the new task or struggle to balance training stability with efficient reuse of pre-trained knowledge. We introduce a novel function-preserving expansion method that resolves this dilemma. Our technique expands model capacity by replicating pre-trained parameters within transformer submodules and applying a scaling correction that guarantees the expanded model is mathematically identical to the original at initialization, enabling stable training while exploiting existing knowledge. Empirically, our method eliminates the trade-off between plasticity and stability, matching the performance of full fine-tuning on downstream tasks without any degradation of the model's original capabilities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
