A Deep Reinforced Model for Abstractive Summarization
Romain Paulus, Caiming Xiong, Richard Socher

TL;DR
This paper introduces a neural network model with intra-attention and a combined training approach using supervised learning and reinforcement learning, significantly improving abstractive summarization quality for longer documents.
Contribution
The paper presents a novel intra-attention mechanism and a hybrid training method that enhances the coherence and readability of summaries in abstractive summarization models.
Findings
Achieved 41.16 ROUGE-1 score on CNN/Daily Mail dataset.
Model outperforms previous state-of-the-art in automatic metrics.
Human evaluation confirms higher quality summaries.
Abstract
Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit "exposure bias" - they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
