A Comparison of On-Line Computer Science Citation Databases
Vaclav Petricek, Ingemar J. Cox, Hui Han, Isaac G. Councill, C. Lee, Giles

TL;DR
This paper compares two online computer science citation databases, DBLP and CiteSeer, highlighting their differences, biases, and coverage, and models their citation distributions to understand their limitations for research assessment.
Contribution
It provides a detailed comparison of DBLP and CiteSeer, models their citation biases, and estimates DBLP's coverage of computer science literature.
Findings
CiteSeer contains fewer single-author papers, modeled by an exponential process.
DBLP covers approximately 24% of computer science literature.
Both databases show increasing authors per paper over time.
Abstract
This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer's autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies. We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Peer-to-Peer Network Technologies · Complex Network Analysis Techniques
