Effective Unsupervised Author Disambiguation with Relative Frequencies
Tobias Backes

TL;DR
This paper introduces a simple probabilistic similarity measure for author disambiguation that effectively clusters author mentions in bibliographic data, achieving state-of-the-art results without complex training.
Contribution
The work presents a novel, feature overlap-based probabilistic similarity measure for author disambiguation, evaluated with a straightforward clustering approach and minimal parameter tuning.
Findings
State-of-the-art performance across clustering sizes
Effective without discriminative training
Comparable to trivial baseline in some cases
Abstract
This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at which time during the clustering process our approach works best. In contrast to most other works in this field, we are sceptical towards the performance of author name disambiguation methods in general and compare our approach to the trivial single-cluster baseline. Our results are presented separately for each correct clustering size as we can explain that, when treating all cases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
