Unlocking the Potential of Model Merging for Low-Resource Languages

Mingxu Tao; Chen Zhang; Quzhe Huang; Tianyao Ma; Songfang Huang,; Dongyan Zhao; Yansong Feng

arXiv:2407.03994·cs.CL·February 10, 2025

Unlocking the Potential of Model Merging for Low-Resource Languages

Mingxu Tao, Chen Zhang, Quzhe Huang, Tianyao Ma, Songfang Huang,, Dongyan Zhao, Yansong Feng

PDF

Open Access

TL;DR

This paper proposes model merging as an effective alternative to traditional fine-tuning for adapting large language models to low-resource languages, especially when data is scarce, demonstrating improved performance and efficiency.

Contribution

It introduces a novel model merging approach for low-resource language adaptation, eliminating the need for additional training and addressing data scarcity issues.

Findings

01

Model merging outperforms traditional CT-then-SFT in low-resource scenarios.

02

Introducing a slack variable improves merging performance by preserving important parameters.

03

Model merging enhances task-solving abilities in low-resource languages with minimal data.

Abstract

Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Business Process Modeling and Analysis · Simulation Techniques and Applications

MethodsShrink and Fine-Tune