Accuracy Assessment of OpenAlex and Clarivate Scholar ID with an   LLM-Assisted Benchmark

Renyu Zhao; Yunxin Chen

arXiv:2502.11610·cs.IR·March 5, 2025

Accuracy Assessment of OpenAlex and Clarivate Scholar ID with an LLM-Assisted Benchmark

Renyu Zhao, Yunxin Chen

PDF

Open Access

TL;DR

This paper evaluates the accuracy of OpenAlex and Clarivate Scholar ID systems in correctly identifying individual scholars across different groups using a large language model-assisted annotation method.

Contribution

It introduces a Search-enhanced Large Language Model approach to assess and compare the effectiveness of scholarly ID systems for name disambiguation.

Findings

01

OpenAlex and Clarivate IDs show varying accuracy across groups.

02

The LLM-assisted annotation improves the reliability of evaluation.

03

Results highlight strengths and limitations of current ID systems.

Abstract

In quantitative SciSci (science of science) studies, accurately identifying individual scholars is paramount for scientific data analysis. However, the variability in how names are represented-due to commonality, abbreviations, and different spelling conventions-complicates this task. While identifier systems like ORCID are being developed, many scholars remain unregistered, and numerous publications are not included. Scholarly databases such as Clarivate and OpenAlex have introduced their own ID systems as preliminary name disambiguation solutions. This study evaluates the effectiveness of these systems across different groups to determine their suitability for various application scenarios. We sampled authors from the top quartile (Q1) of Web of Science (WOS) journals based on country, discipline, and number of corresponding author papers. For each group, we selected 100 scholars and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Semantic Web and Ontologies · Data Mining Algorithms and Applications