Text Summarization with Pretrained Encoders

Yang Liu; Mirella Lapata

arXiv:1908.08345·cs.CL·September 6, 2019·150 cites

Text Summarization with Pretrained Encoders

Yang Liu, Mirella Lapata

PDF

Open Access 5 Repos

TL;DR

This paper demonstrates how BERT can be effectively adapted for both extractive and abstractive text summarization, introducing novel models and training strategies that achieve state-of-the-art results.

Contribution

The paper presents a unified BERT-based framework for extractive and abstractive summarization, including a new document encoder and fine-tuning methods to improve summary quality.

Findings

01

Achieves state-of-the-art results on three datasets

02

Introduces a novel document-level BERT encoder

03

Develops a two-staged fine-tuning approach

Abstract

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections