An Empirical Survey on Long Document Summarization: Datasets, Models and   Metrics

Huan Yee Koh; Jiaxin Ju; Ming Liu; Shirui Pan

arXiv:2207.00939·cs.CL·July 5, 2022

An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics

Huan Yee Koh, Jiaxin Ju, Ming Liu, Shirui Pan

PDF

1 Repo

TL;DR

This paper provides a comprehensive survey of long document summarization, analyzing datasets, models, and metrics, and offers insights into current research challenges and future directions in the field.

Contribution

It systematically reviews and empirically analyzes the key components of long document summarization research, including datasets, models, and evaluation metrics.

Findings

01

Benchmark datasets have diverse intrinsic characteristics.

02

Models vary significantly in multi-dimensional performance.

03

Evaluation metrics show limitations in capturing summarization quality.

Abstract

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huankoh/long-doc-summarization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.