Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations
Akiko Eriguchi, Shufang Xie, Tao Qin, Hany Hassan Awadalla

TL;DR
This paper presents a two-stage training strategy for multilingual neural machine translation that effectively supports arbitrary translation directions, outperforming traditional models without additional data or architecture changes.
Contribution
It introduces a practical approach combining pretraining and finetuning to improve MNMT performance across all translation directions, including zero-shot scenarios.
Findings
Outperforms bilingual and pivot models by +6.0 and +4.1 BLEU on WMT'21.
Effective in large-scale data settings for real-world deployment.
No architecture modifications or extra data needed.
Abstract
Multilingual Neural Machine Translation (MNMT) enables one system to translate sentences from multiple source languages to multiple target languages, greatly reducing deployment costs compared with conventional bilingual systems. The MNMT training benefit, however, is often limited to many-to-one directions. The model suffers from poor performance in one-to-many and many-to-many with zero-shot setup. To address this issue, this paper discusses how to practically build MNMT systems that serve arbitrary X-Y translation directions while leveraging multilinguality with a two-stage training strategy of pretraining and finetuning. Experimenting with the WMT'21 multilingual translation task, we demonstrate that our systems outperform the conventional baselines of direct bilingual models and pivot translation models for most directions, averagely giving +6.0 and +4.1 BLEU, without the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
