Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles
Amanuel Alambo, Cori Lohstroh, Erik Madaus, Swati Padhee, Brandy, Foster, Tanvi Banerjee, Krishnaprasad Thirunarayan, Michael Raymer

TL;DR
This paper introduces a novel unsupervised framework for multi-document summarization that produces both extractive and abstractive summaries of scientific and news articles, outperforming existing methods on several metrics.
Contribution
It presents a new topic-centric unsupervised approach for multi-document summarization, including a dataset and techniques for salient language unit selection and text generation.
Findings
Achieves state-of-the-art results on extractive summarization metrics.
Performs better on five human evaluation metrics for abstractive summarization.
Plans to release a new dataset for research in this area.
Abstract
Recent advances in natural language processing have enabled automation of a wide range of tasks, including machine translation, named entity recognition, and sentiment analysis. Automated summarization of documents, or groups of documents, however, has remained elusive, with many efforts limited to extraction of keywords, key phrases, or key sentences. Accurate abstractive summarization has yet to be achieved due to the inherent difficulty of the problem, and limited availability of training data. In this paper, we propose a topic-centric unsupervised multi-document summarization framework to generate extractive and abstractive summaries for groups of scientific articles across 20 Fields of Study (FoS) in Microsoft Academic Graph (MAG) and news articles from DUC-2004 Task 2. The proposed algorithm generates an abstractive summary by developing salient language unit selection and text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
