Summarization is (Almost) Dead
Xiao Pu, Mingqi Gao, Xiaojun Wan

TL;DR
Large language models demonstrate superior zero-shot summarization capabilities, often outperforming human and fine-tuned model summaries in factual accuracy and coherence, challenging the necessity of traditional summarization research.
Contribution
The paper introduces new datasets and human evaluation methods to assess LLMs' zero-shot summarization, revealing their dominance over traditional approaches.
Findings
LLMs outperform humans and fine-tuned models in factual consistency
LLMs generate summaries with fewer hallucinations
Traditional summarization research may be less necessary now
Abstract
How well can large language models (LLMs) generate summaries? We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of LLMs across five distinct summarization tasks. Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models. Specifically, LLM-generated summaries exhibit better factual consistency and fewer instances of extrinsic hallucinations. Due to the satisfactory performance of LLMs in summarization tasks (even surpassing the benchmark of reference summaries), we believe that most conventional works in the field of text summarization are no longer necessary in the era of LLMs. However, we recognize that there are still some directions worth exploring, such as the creation of novel datasets with higher quality and more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
