Accuracy Assessment of OpenAlex and Clarivate Scholar ID with an LLM-Assisted Benchmark
Renyu Zhao, Yunxin Chen

TL;DR
This paper evaluates the accuracy of OpenAlex and Clarivate Scholar ID systems in correctly identifying individual scholars across different groups using a large language model-assisted annotation method.
Contribution
It introduces a Search-enhanced Large Language Model approach to assess and compare the effectiveness of scholarly ID systems for name disambiguation.
Findings
OpenAlex and Clarivate IDs show varying accuracy across groups.
The LLM-assisted annotation improves the reliability of evaluation.
Results highlight strengths and limitations of current ID systems.
Abstract
In quantitative SciSci (science of science) studies, accurately identifying individual scholars is paramount for scientific data analysis. However, the variability in how names are represented-due to commonality, abbreviations, and different spelling conventions-complicates this task. While identifier systems like ORCID are being developed, many scholars remain unregistered, and numerous publications are not included. Scholarly databases such as Clarivate and OpenAlex have introduced their own ID systems as preliminary name disambiguation solutions. This study evaluates the effectiveness of these systems across different groups to determine their suitability for various application scenarios. We sampled authors from the top quartile (Q1) of Web of Science (WOS) journals based on country, discipline, and number of corresponding author papers. For each group, we selected 100 scholars and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Data Mining Algorithms and Applications
