CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers
Javin Liu, Aryan Vats, Zihao He

TL;DR
CS-PaperSum is a large dataset of over 91,000 computer science papers with AI-generated structured summaries, facilitating automated literature analysis and research trend identification.
Contribution
We created a large-scale dataset with AI-generated summaries for scientific papers, enabling advanced analysis of research trends and scientific discovery.
Findings
Strong preservation of key concepts in summaries
Identification of emerging research methodologies
Insights into interdisciplinary research trends
Abstract
The rapid expansion of scientific literature in computer science presents challenges in tracking research trends and extracting key insights. Existing datasets provide metadata but lack structured summaries that capture core contributions and methodologies. We introduce CS-PaperSum, a large-scale dataset of 91,919 papers from 31 top-tier computer science conferences, enriched with AI-generated structured summaries using ChatGPT. To assess summary quality, we conduct embedding alignment analysis and keyword overlap analysis, demonstrating strong preservation of key concepts. We further present a case study on AI research trends, highlighting shifts in methodologies and interdisciplinary crossovers, including the rise of self-supervised learning, retrieval-augmented generation, and multimodal AI. Our dataset enables automated literature analysis, research trend forecasting, and AI-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Biomedical Text Mining and Ontologies · Topic Modeling
