Collecting large-scale publication data at the level of individual   researchers: A practical proposal for author name disambiguation

Ciriaco Andrea D'Angelo; Nees Jan van Eck

arXiv:2103.14558·cs.DL·March 29, 2021·Scientometrics

Collecting large-scale publication data at the level of individual researchers: A practical proposal for author name disambiguation

Ciriaco Andrea D'Angelo, Nees Jan van Eck

PDF

TL;DR

This paper introduces a practical method for author name disambiguation using external data sources, achieving over 96% accuracy, to facilitate large-scale bibliometric analysis at the individual researcher level.

Contribution

It presents a novel approach combining unsupervised clustering with external validation sources for accurate author disambiguation.

Findings

01

Achieved over 96% precision, recall, and F-Measure.

02

Demonstrated effectiveness on a sample of Italian scholars.

03

Provides a scalable solution for large-scale bibliometric data collection.

Abstract

The disambiguation of author names is an important and challenging task in bibliometrics. We propose an approach that relies on an external source of information for selecting and validating clusters of publications identified through an unsupervised author name disambiguation method. The application of the proposed approach to a random sample of Italian scholars shows encouraging results, with an overall precision, recall, and F-Measure of over 96%. The proposed approach can serve as a starting point for large-scale census of publication portfolios for bibliometric analyses at the level of individual researchers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.