Multilingual Denoising Pre-training for Neural Machine Translation
Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan, Ghazvininejad, Mike Lewis, Luke Zettlemoyer

TL;DR
This paper introduces mBART, a multilingual denoising auto-encoder pre-training method that significantly improves machine translation performance across various tasks by pre-training a full sequence-to-sequence model on large monolingual corpora.
Contribution
It presents the first complete sequence-to-sequence pre-training approach for multiple languages, enabling direct fine-tuning for diverse MT tasks without task-specific modifications.
Findings
Up to 12 BLEU points improvement in low-resource MT
Over 5 BLEU points gain in document-level and unsupervised models
Enables transfer to language pairs without bi-text data
Abstract
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ELiRF/mbart-large-cc25-dacsa-camodel· 3 dl· ♡ 13 dl♡ 1
- 🤗ELiRF/mbart-large-cc25-dacsa-esmodel· 38 dl· ♡ 438 dl♡ 4
- 🤗facebook/mgenre-wikimodel· 877 dl· ♡ 29877 dl♡ 29
- 🤗Short-Answer-Feedback/mbart-finetuned-saf-micro-jobmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗Short-Answer-Feedback/mbart-finetuned-saf-legal-domainmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗Short-Answer-Feedback/mbart-score-finetuned-saf-micro-jobmodel· 5 dl5 dl
- 🤗Short-Answer-Feedback/mbart-score-finetuned-saf-legal-domainmodel· 7 dl7 dl
- 🤗impresso-project/nel-mgenre-multilingualmodel· 235 dl· ♡ 4235 dl♡ 4
- 🤗MonicaDasari/FinalProjectmodel· 1 dl1 dl
- 🤗shadabtanjeed/mbart-banglish-to-bengali-transliterationmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · mBART · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · Softmax · Dropout
