FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

Shaolin Zhu; Tianyu Dong; Bo Li; Deyi Xiong

arXiv:2505.14256·cs.CL·May 21, 2025

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

Shaolin Zhu, Tianyu Dong, Bo Li, Deyi Xiong

PDF

Open Access 1 Datasets

TL;DR

FuxiMT is a Chinese-centric multilingual machine translation model that uses sparsified large language models, achieving superior performance especially in low-resource and zero-shot translation scenarios.

Contribution

The paper introduces FuxiMT, a novel sparsified LLM-based multilingual translation model with a two-stage training process and curriculum learning for improved low-resource and zero-shot translation.

Findings

01

Outperforms state-of-the-art baselines in various translation tasks.

02

Shows strong zero-shot translation capabilities for unseen language pairs.

03

Effective in low-resource translation scenarios.

Abstract

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

liboaccn/nmt-parallel-corpus
dataset· 229 dl
229 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Big Data and Digital Economy

MethodsADaptive gradient method with the OPTimal convergence rate