Multilingual Translation with Extensible Multilingual Pretraining and   Finetuning

Yuqing Tang; Chau Tran; Xian Li; Peng-Jen Chen; Naman Goyal; Vishrav; Chaudhary; Jiatao Gu; Angela Fan

arXiv:2008.00401·cs.CL·August 4, 2020·151 cites

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav, Chaudhary, Jiatao Gu, Angela Fan

PDF

Open Access 5 Repos 10 Models 4 Datasets

TL;DR

This paper introduces a method for creating multilingual translation models through multilingual finetuning of pretrained models, enabling support for more languages and improved translation performance, especially for low-resource languages.

Contribution

It demonstrates that multilingual finetuning of pretrained models can extend language coverage and improve translation quality without performance loss, and introduces the ML50 benchmark for standardized evaluation.

Findings

01

Multilingual finetuning outperforms bilingual and from-scratch models in BLEU score.

02

Extended mBART to support 50 languages without performance loss.

03

Created ML50 benchmark for reproducible multilingual translation research.

Abstract

Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages. Previous work in multilingual pretraining has demonstrated that machine translation systems can be created by finetuning on bitext. In this work, we show that multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one direction, a pretrained model is finetuned on many directions at the same time. Compared to multilingual models trained from scratch, starting from pretrained models incorporates the benefits of large quantities of unlabeled monolingual data, which is particularly important for low resource languages where bitext is not available. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance. We double the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsmBART