Effective Unsupervised Author Disambiguation with Relative Frequencies

Tobias Backes

arXiv:1808.04216·cs.IR·August 14, 2018

Effective Unsupervised Author Disambiguation with Relative Frequencies

Tobias Backes

PDF

TL;DR

This paper introduces a simple probabilistic similarity measure for author disambiguation that effectively clusters author mentions in bibliographic data, achieving state-of-the-art results without complex training.

Contribution

The work presents a novel, feature overlap-based probabilistic similarity measure for author disambiguation, evaluated with a straightforward clustering approach and minimal parameter tuning.

Findings

01

State-of-the-art performance across clustering sizes

02

Effective without discriminative training

03

Comparable to trivial baseline in some cases

Abstract

This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at which time during the clustering process our approach works best. In contrast to most other works in this field, we are sceptical towards the performance of author name disambiguation methods in general and compare our approach to the trivial single-cluster baseline. Our results are presented separately for each correct clustering size as we can explain that, when treating all cases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.