Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches
Alexander Tekles, Lutz Bornmann

TL;DR
This study compares several unsupervised author name disambiguation methods in bibliometric data, demonstrating that all outperform simple name-based approaches, with the Caron and van Eck method achieving the best results.
Contribution
It provides a controlled comparison of unsupervised disambiguation approaches, analyzing their performance and parameter effects on complex disambiguation tasks.
Findings
All evaluated approaches outperform name-only methods.
The Caron and van Eck (2014) approach yields the best results.
Disambiguation performance depends on approach parametrization and task complexity.
Abstract
Adequately disambiguating author names in bibliometric databases is a precondition for conducting reliable analyses at the author level. In the case of bibliometric studies that include many researchers, it is not possible to disambiguate each single researcher manually. Several approaches have been proposed for author name disambiguation but there has not yet been a comparison of them under controlled conditions. In this study, we compare a set of unsupervised disambiguation approaches. Unsupervised approaches specify a model to assess the similarity of author mentions a priori instead of training a model with labelled data. In order to evaluate the approaches, we applied them to a set of author mentions annotated with a ResearcherID, this being an author identifier maintained by the researchers themselves. Apart from comparing the overall performance, we take a more detailed look at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
