DeltaLM: Encoder-Decoder Pre-training for Language Generation and   Translation by Augmenting Pretrained Multilingual Encoders

Shuming Ma; Li Dong; Shaohan Huang; Dongdong Zhang; Alexandre Muzio,; Saksham Singhal; Hany Hassan Awadalla; Xia Song; Furu Wei

arXiv:2106.13736·cs.CL·August 19, 2021·51 cites

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio,, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

PDF

Open Access 2 Repos 2 Models

TL;DR

DeltaLM is a novel pretrained multilingual encoder-decoder model that bridges the gap between language understanding and generation tasks by augmenting existing encoders with a decoder and employing self-supervised pre-training.

Contribution

It introduces a unified encoder-decoder pretraining framework that leverages both monolingual and bilingual data for improved language generation and translation performance.

Findings

01

Outperforms strong baselines on translation and generation tasks

02

Effective use of span corruption and translation span corruption pretraining tasks

03

Achieves state-of-the-art results in multiple NLP benchmarks

Abstract

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification