Multi-Document Summarization using Distributed Bag-of-Words Model

Kaustubh Mani; Ishan Verma; Hardik Meisheri; Lipika Dey

arXiv:1710.02745·cs.CL·June 12, 2018

Multi-Document Summarization using Distributed Bag-of-Words Model

Kaustubh Mani, Ishan Verma, Hardik Meisheri, Lipika Dey

PDF

TL;DR

This paper introduces an unsupervised, centroid-based multi-document summarization method utilizing a distributed bag-of-words model, which effectively selects summary sentences to minimize reconstruction error, showing significant improvements over existing methods.

Contribution

The paper proposes a novel unsupervised framework for multi-document summarization based on distributed bag-of-words and reconstruction error minimization, with enhanced sentence selection strategies.

Findings

01

Significant performance improvements over state-of-the-art baselines.

02

Effective sentence selection via reconstruction error minimization.

03

Robust results on multiple datasets.

Abstract

As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model. Specifically, our approach selects summary sentences in order to minimize the reconstruction error between the summary and the documents. We apply sentence selection and beam search, to further improve the performance of our model. Experimental results on two different datasets show significant performance gains compared with the state-of-the-art baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.