Collecting large-scale publication data at the level of individual researchers: A practical proposal for author name disambiguation
Ciriaco Andrea D'Angelo, Nees Jan van Eck

TL;DR
This paper introduces a practical method for author name disambiguation using external data sources, achieving over 96% accuracy, to facilitate large-scale bibliometric analysis at the individual researcher level.
Contribution
It presents a novel approach combining unsupervised clustering with external validation sources for accurate author disambiguation.
Findings
Achieved over 96% precision, recall, and F-Measure.
Demonstrated effectiveness on a sample of Italian scholars.
Provides a scalable solution for large-scale bibliometric data collection.
Abstract
The disambiguation of author names is an important and challenging task in bibliometrics. We propose an approach that relies on an external source of information for selecting and validating clusters of publications identified through an unsupervised author name disambiguation method. The application of the proposed approach to a random sample of Italian scholars shows encouraging results, with an overall precision, recall, and F-Measure of over 96%. The proposed approach can serve as a starting point for large-scale census of publication portfolios for bibliometric analyses at the level of individual researchers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
