Network-based statistical comparison of citation topology of bibliographic databases
Lovro \v{S}ubelj, Dalibor Fiala, Marko Bajec

TL;DR
This study compares the citation network structures of six major bibliographic databases, revealing significant topological inconsistencies and assessing their reliability for scientific research and evaluation.
Contribution
It introduces a comprehensive topological comparison framework for bibliographic databases, highlighting their differences and reliability based on citation network analysis.
Findings
Web of Science is the most consistent database.
arXiv.org has the most exhaustive citation information.
DBLP shows significant topological differences due to coverage.
Abstract
Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
