Exploring Unsupervised Pretraining Objectives for Machine Translation

Christos Baziotis; Ivan Titov; Alexandra Birch; Barry Haddow

arXiv:2106.05634·cs.CL·June 11, 2021

Exploring Unsupervised Pretraining Objectives for Machine Translation

Christos Baziotis, Ivan Titov, Alexandra Birch, Barry Haddow

PDF

1 Repo

TL;DR

This paper systematically compares different unsupervised pretraining objectives for neural machine translation, revealing that supervised fine-tuning is less sensitive to the pretraining method than unsupervised translation, which requires strong cross-lingual representations.

Contribution

It introduces alternative pretraining objectives that produce more realistic input data and analyzes their impact on translation performance and model representations.

Findings

01

Supervised NMT performance is minimally affected by pretraining objectives.

02

Unsupervised NMT is highly sensitive to the choice of pretraining objective.

03

Models with strong cross-lingual abilities are crucial for unsupervised translation.

Abstract

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English $\leftrightarrow$ German, English $\leftrightarrow$ Nepali and English $\leftrightarrow$ Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cbaziotis/nmt-pretraining-objectives
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.