Pre-trained Language Model Representations for Language Generation
Sergey Edunov, Alexei Baevski, Michael Auli

TL;DR
This paper explores how pre-trained language model representations can enhance sequence-to-sequence models for tasks like machine translation and summarization, showing significant improvements especially in low-resource scenarios.
Contribution
It systematically evaluates strategies for integrating pre-trained representations into seq2seq models, demonstrating their effectiveness and optimal configurations for translation and summarization.
Findings
Adding pre-trained representations to the encoder improves performance with minimal slowdown.
Up to 5.3 BLEU gain in low-resource machine translation.
Achieved new state-of-the-art on CNN/DailyMail summarization.
Abstract
Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
