Accuracy of simple, initials-based methods for author name disambiguation
Sta\v{s}a Milojevi\'c

TL;DR
This study evaluates the accuracy of basic initials-based author disambiguation methods using simulated datasets, showing that simple methods are highly effective and proposing a hybrid approach that improves accuracy further.
Contribution
The paper provides realistic accuracy estimates for simple initials-based methods and introduces a new hybrid method that enhances disambiguation performance.
Findings
First initial method correctly identifies 97% of authors.
All initials method is about half as accurate as the first initial method.
Hybrid method reduces errors by 10-30% compared to first initial method.
Abstract
There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common co-authorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
