Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu; Jiatao Gu; Naman Goyal; Xian Li; Sergey Edunov; Marjan; Ghazvininejad; Mike Lewis; Luke Zettlemoyer

arXiv:2001.08210·cs.CL·January 24, 2020·607 cites

Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan, Ghazvininejad, Mike Lewis, Luke Zettlemoyer

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces mBART, a multilingual denoising auto-encoder pre-training method that significantly improves machine translation performance across various tasks by pre-training a full sequence-to-sequence model on large monolingual corpora.

Contribution

It presents the first complete sequence-to-sequence pre-training approach for multiple languages, enabling direct fine-tuning for diverse MT tasks without task-specific modifications.

Findings

01

Up to 12 BLEU points improvement in low-resource MT

02

Over 5 BLEU points gain in document-level and unsupervised models

03

Enables transfer to language pairs without bi-text data

Abstract

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsLinear Layer · mBART · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · Softmax · Dropout