Investigating Entropy for Extractive Document Summarization

Alka Khurana; Vasudha Bhatnagar

arXiv:2109.10886·cs.IR·October 1, 2021

Investigating Entropy for Extractive Document Summarization

Alka Khurana, Vasudha Bhatnagar

PDF

TL;DR

This paper introduces E-Summ, an unsupervised, explainable extractive summarization algorithm using Shannon entropy and NMF, which is fast, domain-independent, and suitable for real-time applications, but requires further enhancement to match neural methods.

Contribution

The paper proposes a novel entropy-based extractive summarization method using NMF, emphasizing explainability, efficiency, and language independence.

Findings

01

E-Summ is fast and domain-agnostic.

02

It provides transparent and explainable summaries.

03

Performance is promising but needs improvement over neural approaches.

Abstract

Automatic text summarization aims to cut down readers time and cognitive effort by reducing the content of a text document without compromising on its essence. Ergo, informativeness is the prime attribute of document summary generated by an algorithm, and selecting sentences that capture the essence of a document is the primary goal of extractive document summarization. In this paper, we employ Shannon entropy to capture informativeness of sentences. We employ Non-negative Matrix Factorization (NMF) to reveal probability distributions for computing entropy of terms, topics, and sentences in latent space. We present an information theoretic interpretation of the computed entropy, which is the bedrock of the proposed E-Summ algorithm, an unsupervised method for extractive document summarization. The algorithm systematically applies information theoretic principle for selecting informative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.