Beyond English-Centric Multilingual Machine Translation
Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky,, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav, Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov,, Edouard Grave, Michael Auli, Armand Joulin

TL;DR
This paper develops a true Many-to-Many multilingual translation model for 100 languages, significantly improving direct non-English translation quality and providing open-source datasets and models for broader research.
Contribution
It introduces a comprehensive training dataset and a scalable model architecture for direct translation between any pair of 100 languages, moving beyond English-centric approaches.
Findings
Over 10 BLEU gain in non-English translation directions
Competitive performance with top WMT systems
Open-source datasets and models for community use
Abstract
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BSC-LT/salamandraTA-7b-instructmodel· 1.6k dl· ♡ 251.6k dl♡ 25
- 🤗facebook/m2m100_1.2Bmodel· 61k dl· ♡ 20161k dl♡ 201
- 🤗facebook/m2m100_418Mmodel· 377k dl· ♡ 339377k dl♡ 339
- 🤗valhalla/m2m100_tiny_randommodel· 1.4k dl1.4k dl
- 🤗facebook/m2m100-12B-last-ckptmodel· 15 dl· ♡ 2515 dl♡ 25
- 🤗facebook/m2m100-12B-avg-5-ckptmodel· 37 dl· ♡ 937 dl♡ 9
- 🤗facebook/m2m100-12B-avg-10-ckptmodel· 9 dl9 dl
- 🤗optimum/m2m100_418Mmodel· 10 dl· ♡ 210 dl♡ 2
- 🤗michaelfeil/ct2fast-m2m100_1.2Bmodel· 57 dl· ♡ 1057 dl♡ 10
- 🤗michaelfeil/ct2fast-m2m100_418Mmodel· 34 dl· ♡ 734 dl♡ 7
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
