D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research
Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Bela Gipp

TL;DR
This paper introduces D3, a comprehensive dataset of over 6 million computer science publications from DBLP, enabling analysis of research trends, productivity, and impact, with initial findings showing growth and evolving topic focus.
Contribution
The paper presents the creation of D3, a large-scale, publicly available dataset of scholarly metadata, and provides initial analyses of research activity and trends in computer science.
Findings
Computer science research is growing at approximately 15% annually.
Recent papers have more bibliographical entries but fewer citations on average.
Topic trends are clearly reflected in the dataset, D3.
Abstract
DBLP is the largest open-access repository of scientific articles on computer science and provides metadata associated with publications, authors, and venues. We retrieved more than 6 million publications from DBLP and extracted pertinent metadata (e.g., abstracts, author affiliations, citations) from the publication texts to create the DBLP Discovery Dataset (D3). D3 can be used to identify trends in research activity, productivity, focus, bias, accessibility, and impact of computer science research. We present an initial analysis focused on the volume of computer science research (e.g., number of papers, authors, research activity), trends in topics of interest, and citation patterns. Our findings show that computer science is a growing research field (approx. 15% annually), with an active and collaborative researcher community. While papers in recent years present more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · scientometrics and bibliometrics research
