Abstractive Text Summarization based on Language Model Conditioning and   Locality Modeling

Dmitrii Aksenov; Juli\'an Moreno-Schneider; Peter Bourgonje and; Robert Schwarzenberg; Leonhard Hennig; Georg Rehm

arXiv:2003.13027·cs.CL·March 31, 2020·6 cites

Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

Dmitrii Aksenov, Juli\'an Moreno-Schneider, Peter Bourgonje and, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm

PDF

Open Access 1 Repo

TL;DR

This paper investigates how conditioning Transformer-based models on pre-trained BERT and incorporating locality modeling through convolutional self-attention improve abstractive summarization, demonstrating superior performance on English and German datasets.

Contribution

It introduces BERT conditioning, a novel BERT-windowing method for long texts, and locality modeling via convolutional self-attention, enhancing summarization quality.

Findings

01

Models outperform baseline in ROUGE scores on CNN/Daily Mail and SwissText datasets.

02

Locality modeling improves focus on relevant context, boosting summary quality.

03

BERT conditioning and locality modeling are effective across languages.

Abstract

We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

axenov/BERT-Summ-OpenNMT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · *Communicated@Fast*How Do I Communicate to Expedia? · Byte Pair Encoding · Label Smoothing · Transformer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay