A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Hao Zhou; Tianhao Li; Zhijun Wang; Shuaijie She; Linjuan Wu; Hao-Ran Wei; Baosong Yang; Jiajun Chen; Shujian Huang

arXiv:2605.18083·cs.CL·May 19, 2026

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Hao Zhou, Tianhao Li, Zhijun Wang, Shuaijie She, Linjuan Wu, Hao-Ran Wei, Baosong Yang, Jiajun Chen, Shujian Huang

PDF

TL;DR

This paper introduces extmethod, a method to efficiently expand multilingual capabilities of large language models by integrating language-specific experts via parameter deltas, avoiding costly retraining.

Contribution

The paper proposes a novel approach to expand LLMs to new languages using MoE architecture and parameter deltas, bypassing extensive retraining and alignment.

Findings

01

extmethod improves performance on new languages while preserving original capabilities.

02

It outperforms baselines with similar FLOPs or parameters.

03

The approach is applicable across different models and post-training deltas.

Abstract

Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by fusing a multilingual CPT-enhanced model with its instruct counterpart, they are plagued by a critical trade-off: mitigating parameter conflicts to preserve original abilities inevitably dilutes new language acquisition, and vice-versa. To resolve this conflict, we introduce \method, which upcycles a dense model into a Mixture-of-Experts~(MoE) architecture, allocating different experts to different languages. Alignment ability is then transferred by grafting a MoE-expanded parameter delta~( $Δ_{post}$ ) to the CPT-enhanced base model, bypassing the complex alignment phase. Experiments demonstrate \method's superiority even against baselines with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.