A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments
Ciriaco Andrea D'Angelo, Cristiano Giuffrida, Giovanni Abramo

TL;DR
This paper presents a practical heuristic method for author name disambiguation in bibliometric datasets, enabling large-scale research assessments with improved scalability and adequate accuracy.
Contribution
It introduces a simple, scalable heuristic approach for author disambiguation tailored for large bibliometric datasets, outperforming complex unsupervised methods.
Findings
Method is easy to implement and scalable.
Achieves adequate precision and recall for large-scale assessments.
Applicable to extensive datasets like the Italian university system.
Abstract
National exercises for the evaluation of research activity by universities are becoming regular practice in ever more countries. These exercises have mainly been conducted through the application of peer-review methods. Bibliometrics has not been able to offer a valid large-scale alternative because of almost overwhelming difficulties in identifying the true author of each publication. We will address this problem by presenting a heuristic approach to author name disambiguation in bibliometric datasets for large-scale research assessments. The application proposed concerns the Italian university system, consisting of 80 universities and a research staff of over 60,000 scientists. The key advantage of the proposed approach is the ease of implementation. The algorithms are of practical application and have considerably better scalability and expandability properties than state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
