Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
Fei Yuan, Yinquan Lu, WenHao Zhu, Lingpeng Kong, Lei Li, Yu Qiao,, Jingjing Xu

TL;DR
Lego-MT introduces a detachable, plug-and-play multilingual translation model that improves efficiency and performance across 433 languages, outperforming larger models with significantly less parameters.
Contribution
The paper presents a novel detachable model architecture and an efficient training recipe for massively multilingual translation, addressing parameter interference and inference inefficiency.
Findings
Lego-MT achieves a 3.2 spBLEU gain over baseline models.
It outperforms M2M-100 with 12B parameters using only 1.2B parameters.
Training speed is increased by 28.2 times compared to traditional methods.
Abstract
Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsALIGN
