PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence   Pretraining

Machel Reid; Mikel Artetxe

arXiv:2108.01887·cs.CL·August 5, 2021·1 cites

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

Machel Reid, Mikel Artetxe

PDF

Open Access 1 Repo

TL;DR

PARADISE introduces a novel pretraining method that leverages parallel data in multilingual sequence-to-sequence models, significantly improving translation and inference performance while reducing computational costs.

Contribution

It extends denoising pretraining by incorporating parallel data through dictionary-based word replacement and translation prediction, enhancing multilingual model training.

Findings

01

Improves BLEU scores by 2.0 on average

02

Increases cross-lingual inference accuracy by 6.7 points

03

Achieves competitive results with less computational cost

Abstract

Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data. In this paper, we present PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by (i) replacing words in the noised sequence according to a multilingual dictionary, and (ii) predicting the reference translation according to a parallel corpus instead of recovering the original sequence. Our experiments on machine translation and cross-lingual natural language inference show an average improvement of 2.0 BLEU points and 6.7 accuracy points from integrating parallel data into pretraining, respectively, obtaining results that are competitive with several popular models at a fraction of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

machelreid/paradise
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling