SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era
Han Jang, Junhyeok Lee, Kyu Sung Choi

TL;DR
SciZoom is a comprehensive benchmark dataset of nearly 45,000 scientific papers designed to evaluate hierarchical summarization methods and analyze linguistic shifts in scientific writing before and after the widespread adoption of LLMs like ChatGPT.
Contribution
It introduces a large-scale, multi-granularity scientific summarization benchmark across the LLM era, enabling research on summarization and linguistic evolution in scientific discourse.
Findings
Detected significant shifts in phrase patterns and rhetorical styles post-LLM adoption.
Provided evidence of more confident but homogenized scientific writing with LLM assistance.
Created a publicly available dataset for future research in scientific summarization and discourse analysis.
Abstract
The explosive growth of AI research has created unprecedented information overload, increasing the demand for scientific summarization at multiple levels of granularity beyond traditional abstracts. While LLMs are increasingly adopted for summarization, existing benchmarks remain limited in scale, target only a single granularity, and predate the LLM era. Moreover, since the release of ChatGPT in November 2022, researchers have rapidly adopted LLMs for drafting manuscripts themselves, fundamentally transforming scientific writing, yet no resource exists to analyze how this writing has evolved. To bridge these gaps, we introduce SciZoom, a benchmark comprising 44,946 papers from four top-tier ML venues (NeurIPS, ICLR, ICML, EMNLP) spanning 2020 to 2025, explicitly stratified into Pre-LLM and Post-LLM eras. SciZoom provides three hierarchical summarization targets (Abstract,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Text Readability and Simplification
