TL;DR
This paper provides a comprehensive survey of long document summarization, analyzing datasets, models, and metrics, and offers insights into current research challenges and future directions in the field.
Contribution
It systematically reviews and empirically analyzes the key components of long document summarization research, including datasets, models, and evaluation metrics.
Findings
Benchmark datasets have diverse intrinsic characteristics.
Models vary significantly in multi-dimensional performance.
Evaluation metrics show limitations in capturing summarization quality.
Abstract
Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
