S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Shivashankar Subramanian; Daniel King; Doug Downey; Sergey Feldman

arXiv:2103.07534·cs.DL·February 22, 2022·1 cites

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Shivashankar Subramanian, Daniel King, Doug Downey, Sergey Feldman

PDF

Open Access 1 Repo

TL;DR

S2AND introduces a unified benchmark dataset and evaluation system for author name disambiguation, enabling more robust and generalizable algorithms across diverse scholarly datasets.

Contribution

It provides a harmonized dataset, an open-source reference model, and an evaluation framework for fair comparison and improved performance in author name disambiguation.

Findings

01

Training on combined datasets improves model robustness.

02

The new model reduces error by over 50% compared to existing algorithms.

03

Evaluation across facets reveals performance disparities and fairness issues.

Abstract

Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analysis. While many AND algorithms have been proposed, comparing them is difficult because they often employ distinct features and are evaluated on different datasets. In response to this challenge, we present S2AND, a unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation. Our dataset harmonizes eight disparate AND datasets into a uniform format, with a single rich feature set drawn from the Semantic Scholar (S2) database. Our evaluation suite for S2AND reports performance split by facets like publication year and number of papers, allowing researchers to track both global performance and measures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/S2AND
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Biomedical Text Mining and Ontologies