Abstractive Document Summarization without Parallel Data

Nikola I. Nikolov; Richard H.R. Hahnloser

arXiv:1907.12951·cs.CL·March 4, 2020·5 cites

Abstractive Document Summarization without Parallel Data

Nikola I. Nikolov, Richard H.R. Hahnloser

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised abstractive summarization method that generates summaries without needing parallel article-summary data, using only collections of summaries and articles, and performs well on benchmarks and real-world tasks.

Contribution

The authors propose a novel unsupervised approach combining sentence extraction and paraphrasing trained on pseudo-data, eliminating the need for paired datasets.

Findings

01

Competitive performance on CNN/DailyMail benchmark

02

Effective in generating press releases from scientific articles

03

Outperforms some supervised methods in low-resource settings

Abstract

Abstractive summarization typically relies on large collections of paired articles and summaries. However, in many cases, parallel data is scarce and costly to obtain. We develop an abstractive summarization system that relies only on large collections of example summaries and non-matching articles. Our approach consists of an unsupervised sentence extractor that selects salient sentences to include in the final summary, as well as a sentence abstractor that is trained on pseudo-parallel and synthetic data, that paraphrases each of the extracted sentences. We perform an extensive evaluation of our method: on the CNN/DailyMail benchmark, on which we compare our approach to fully supervised baselines, as well as on the novel task of automatically generating a press release from a scientific journal article, which is well suited for our system. We show promising performance on both tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ninikolov/low_resource_summarization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management