Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa, Teruko Mitamura

TL;DR
This paper compares compression-based and full-text methods for large-scale multi-document summarization, finding that full-text approaches generally outperform compression, but hybrid methods may offer the best results.
Contribution
It provides a comprehensive evaluation of compression and full-text MDS methods on large datasets, highlighting their strengths and limitations, and suggests hybrid approaches for improved performance.
Findings
Full-text methods outperform compression in most settings.
Compression methods retain salient info at intermediate stages.
Hybrid approaches could combine strengths of both methods.
Abstract
Automatically summarizing large text collections is a valuable tool for document research, with applications in journalism, academic research, legal work, and many other fields. In this work, we contrast two classes of systems for large-scale multi-document summarization (MDS): compression and full-text. Compression-based methods use a multi-stage pipeline and often lead to lossy summaries. Full-text methods promise a lossless summary by relying on recent advances in long-context reasoning. To understand their utility on large-scale MDS, we evaluated them on three datasets, each containing approximately one hundred documents per summary. Our experiments cover a diverse set of long-context transformers (Llama-3.1, Command-R, Jamba-1.5-Mini) and compression methods (retrieval-augmented, hierarchical, incremental). Overall, we find that full-text and retrieval methods perform the best in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Web Data Mining and Analysis
MethodsSparse Evolutionary Training
