FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation
Shaolin Zhu, Tianyu Dong, Bo Li, Deyi Xiong

TL;DR
FuxiMT is a Chinese-centric multilingual machine translation model that uses sparsified large language models, achieving superior performance especially in low-resource and zero-shot translation scenarios.
Contribution
The paper introduces FuxiMT, a novel sparsified LLM-based multilingual translation model with a two-stage training process and curriculum learning for improved low-resource and zero-shot translation.
Findings
Outperforms state-of-the-art baselines in various translation tasks.
Shows strong zero-shot translation capabilities for unseen language pairs.
Effective in low-resource translation scenarios.
Abstract
In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Big Data and Digital Economy
MethodsADaptive gradient method with the OPTimal convergence rate
