Effect of forename string on author name disambiguation
Jinseok Kim, Jenna Kim

TL;DR
This paper investigates how the use of full forenames versus abbreviated forms affects author name disambiguation accuracy, showing that more complete forenames significantly improve disambiguation performance, especially in complex cases.
Contribution
It provides a comprehensive analysis of forename effects on disambiguation, highlighting practical strategies like restoring full forenames for better accuracy.
Findings
Increasing full forename ratios improves disambiguation performance.
Algorithmic disambiguation benefits more from full forenames in complex cases.
Partial forenames maintain similar performance to full forenames in many scenarios.
Abstract
In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performances of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled datasets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). Results show that increasing the ratios of full forenames improves substantially the performances of both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
