Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus
Jack Culbert, Anne Hobert, Najko Jahn, Nick Haupka, Marion Schmidt,, Paul Donner, Philipp Mayr

TL;DR
This study compares the reference coverage and metadata quality of OpenAlex with Web of Science and Scopus, showing comparable reference counts and coverage, but some differences in metadata details across sources.
Contribution
It provides a large-scale, empirical comparison of OpenAlex's bibliometric data against established proprietary databases, assessing trustworthiness and coverage.
Findings
OpenAlex has similar reference counts to Web of Science and Scopus.
Metadata coverage in OpenAlex is comparable but varies by journal.
OpenAlex captures more ORCID IDs but fewer abstracts than traditional sources.
Abstract
OpenAlex is a promising open source of scholarly metadata, and competitor to established proprietary sources, such as the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rapidly evolving source and the data contained within is expanding and also quickly changing, the question naturally arises as to the trustworthiness of its data. In this report, we will study the reference coverage and selected metadata within each database and compare them with each other to help address this open question in bibliometrics. In our large-scale study, we demonstrate that, when restricted to a cleaned dataset of 16.8 million recent publications shared by all three databases, OpenAlex has average source reference numbers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Data Quality and Management
