Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Dmitrii Aksenov, Juli\'an Moreno-Schneider, Peter Bourgonje and, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm

TL;DR
This paper investigates how conditioning Transformer-based models on pre-trained BERT and incorporating locality modeling through convolutional self-attention improve abstractive summarization, demonstrating superior performance on English and German datasets.
Contribution
It introduces BERT conditioning, a novel BERT-windowing method for long texts, and locality modeling via convolutional self-attention, enhancing summarization quality.
Findings
Models outperform baseline in ROUGE scores on CNN/Daily Mail and SwissText datasets.
Locality modeling improves focus on relevant context, boosting summary quality.
BERT conditioning and locality modeling are effective across languages.
Abstract
We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · *Communicated@Fast*How Do I Communicate to Expedia? · Byte Pair Encoding · Label Smoothing · Transformer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay
