The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Lucas Bandarkar; Nanyun Peng

arXiv:2505.18356·cs.CL·October 9, 2025

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Lucas Bandarkar, Nanyun Peng

PDF

1 Video

TL;DR

This paper demonstrates that modular model merging techniques, especially layer-swapping, significantly enhance cross-lingual transfer in large language models for low-resource languages, by exploiting the non-overlapping parameter subsets for math and language tasks.

Contribution

It introduces and validates modular frameworks that improve cross-lingual transfer by separately fine-tuning language and math components and merging them effectively.

Findings

01

Layer-swapping via model merging is highly effective.

02

Modular approaches outperform baseline fine-tuning methods.

03

Reverting less useful updates can outperform freezing from the start.

Abstract

Large language models (LLMs) still struggle across tasks outside of high-resource languages. In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce. Building on prior work, we first validate that the subsets of model parameters that matter most for mathematical reasoning and multilingual capabilities are distinctly non-overlapping. To exploit this implicit separability between task and target language parameterization, we develop and analyze numerous modular frameworks to improve the composition of the two during fine-tuning. These methods generally employ freezing parameters or post hoc model merging to assign math and language improvement to different key parts of the LLM. In the absence of in-language math data, we demonstrate that the modular approaches successfully improve upon baselines across three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs· underline

Taxonomy

MethodsHigh-Order Consensuses