Mitigating Catastrophic Forgetting in Language Transfer via Model Merging
Anton Alexandrov, Veselin Raychev, Mark Niklas M\"uller, Ce Zhang,, Martin Vechev, Kristina Toutanova

TL;DR
This paper introduces Branch-and-Merge (BaM), a novel model merging technique that reduces catastrophic forgetting in language transfer tasks, enabling better adaptation of large language models to new languages without losing original capabilities.
Contribution
The paper proposes BaM, a new iterative model merging method that minimizes forgetting during language adaptation, outperforming standard finetuning approaches.
Findings
BaM significantly reduces forgetting in Bulgarian and German models.
BaM maintains or improves target domain performance compared to standard methods.
BaM is effective across different model architectures.
Abstract
As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages. However, such language adaptation is often accompanied by catastrophic forgetting of the base model's capabilities, severely limiting the usefulness of the resulting model. We address this issue by proposing Branch-and-Merge (BaM), a new adaptation method based on iteratively merging multiple models, fine-tuned on a subset of the available training data. BaM is based on the insight that this yields lower magnitude but higher quality weight changes, reducing forgetting of the source domain while maintaining learning on the target domain. We demonstrate in an extensive empirical study on Bulgarian and German that BaM can significantly reduce forgetting while matching or even improving target domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsBottleneck Attention Module · Balanced Selection
