Pre-trained Language Model Representations for Language Generation

Sergey Edunov; Alexei Baevski; Michael Auli

arXiv:1903.09722·cs.CL·April 2, 2019·24 cites

Pre-trained Language Model Representations for Language Generation

Sergey Edunov, Alexei Baevski, Michael Auli

PDF

Open Access 1 Repo

TL;DR

This paper explores how pre-trained language model representations can enhance sequence-to-sequence models for tasks like machine translation and summarization, showing significant improvements especially in low-resource scenarios.

Contribution

It systematically evaluates strategies for integrating pre-trained representations into seq2seq models, demonstrating their effectiveness and optimal configurations for translation and summarization.

Findings

01

Adding pre-trained representations to the encoder improves performance with minimal slowdown.

02

Up to 5.3 BLEU gain in low-resource machine translation.

03

Achieved new state-of-the-art on CNN/DailyMail summarization.

Abstract

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pytorch/fairseq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications