Finding citations for PubMed: A large-scale comparison between five freely available bibliographic data sources
Zhentao Liang, Jin Mao, Kun Lu, Gang Li

TL;DR
This study compares five free bibliographic data sources for PubMed, evaluating their coverage and citation quality using new metrics and standards, revealing Dimensions as the most comprehensive and accurate source.
Contribution
It introduces a large-scale comparison framework for PubMed citation sources, assessing both coverage and quality with new metrics and standards.
Findings
Dimensions covers 62.4% of PubMed documents, outperforming NIH-OCC.
Over 90% of citations in other sources are also in Dimensions.
Dimensions and NIH-OCC have the best overall citation quality.
Abstract
As an important biomedical database, PubMed provides users with free access to abstracts of its documents. However, citations between these documents need to be collected from external data sources. Although previous studies have investigated the coverage of various data sources, the quality of citations is underexplored. In response, this study compares the coverage and citation quality of five freely available data sources on 30 million PubMed documents, including OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI), Dimensions, Microsoft Academic Graph (MAG), National Institutes of Health Open Citation Collection (NIH-OCC), and Semantic Scholar Open Research Corpus (S2ORC). Three gold standards and five metrics are introduced to evaluate the correctness and completeness of citations. Our results indicate that Dimensions is the most comprehensive data source that provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
