MoE-LPR: Multilingual Extension of Large Language Models through   Mixture-of-Experts with Language Priors Routing

Hao Zhou; Zhijun Wang; Shujian Huang; Xin Huang; Xue Han; Junlan Feng,; Chao Deng; Weihua Luo; Jiajun Chen

arXiv:2408.11396·cs.CL·August 22, 2024

MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng,, Chao Deng, Weihua Luo, Jiajun Chen

PDF

Open Access 2 Repos 1 Video

TL;DR

MoE-LPR introduces a two-stage training method using mixture-of-experts and language priors routing to expand multilingual capabilities of large language models while preserving original language knowledge.

Contribution

The paper presents a novel two-stage training approach with language priors routing for effective multilingual expansion without catastrophic forgetting.

Findings

01

Outperforms existing post-pretraining methods on multiple benchmarks.

02

Preserves original language knowledge while expanding to new languages.

03

Maintains inference efficiency despite increased parameters.

Abstract

Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, indicating the challenge of balancing language expansion while preventing forgetting. In this paper, we propose a method called MoE-LPR (Mixture-of-Experts with Language Priors Routing) to alleviate this problem. MoE-LPR employs a two-stage training approach to enhance the multilingual capability. First, the model is post-pretrained into a Mixture-of-Experts (MoE) architecture by upcycling, where all the original parameters are frozen and new experts are added. In this stage, we focus improving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MoE-LPR: Multilingual Extension of Large Language Models Through Mixture-of-Experts with Language Priors Routing· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMixture of Experts · Focus