Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness
Luc\'ia C\'espedes, Diego Kozlowski, Carolina Pradier, Maxime Holmberg, Sainte-Marie, Natsumi Solange Shokida, Pierre Benz, Constance Poitras, Anton, Boudreau Ninkov, Saeideh Ebrahimy, Philips Ayeni, Sarra Filali, Bing Li,, Vincent Larivi\`ere

TL;DR
This study evaluates OpenAlex's metadata accuracy and completeness regarding language information, comparing it with WoS, revealing OpenAlex's broader linguistic coverage but also issues with metadata accuracy that affect language representation.
Contribution
It provides an assessment of OpenAlex's language metadata quality, highlighting its broader coverage and identifying areas for improvement compared to traditional databases.
Findings
OpenAlex has more balanced linguistic coverage than WoS.
Language metadata in OpenAlex is sometimes inaccurate.
OpenAlex tends to overestimate English and underestimate other languages.
Abstract
Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6,836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata is not always accurate,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Library Science and Information Systems · Digital Humanities and Scholarship
