SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis
Yiqiao Jin, Yijia Xiao, Yiyang Wang, Jindong Wang

TL;DR
SciEvo is a comprehensive, longitudinal dataset of over two million publications spanning 30 years, enabling detailed analysis of scientific knowledge evolution and interdisciplinary citation patterns.
Contribution
We present SciEvo, a large-scale, accessible scientometric dataset with tools for cross-disciplinary temporal analysis of scientific literature.
Findings
Application-driven fields like LLMs have shorter citation ages.
Traditional disciplines like oral history show longer citation ages.
Disparities exist in epistemic cultures and citation practices across fields.
Abstract
Understanding the creation, evolution, and dissemination of scientific knowledge is crucial for bridging diverse subject areas and addressing complex global challenges such as pandemics, climate change, and ethical AI. Scientometrics, the quantitative and qualitative study of scientific literature, provides valuable insights into these processes. We introduce SciEvo, a longitudinal scientometric dataset with over two million academic publications, providing comprehensive contents information and citation graphs to support cross-disciplinary analyses. SciEvo is easy to use and available across platforms, including GitHub, Kaggle, and HuggingFace. Using SciEvo, we conduct a temporal study spanning over 30 years to explore key questions in scientometrics: the evolution of academic terminology, citation patterns, and interdisciplinary knowledge exchange. Our findings reveal critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Scientific Computing and Data Management · Big Data Technologies and Applications
