S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
Shivashankar Subramanian, Daniel King, Doug Downey, Sergey Feldman

TL;DR
S2AND introduces a unified benchmark dataset and evaluation system for author name disambiguation, enabling more robust and generalizable algorithms across diverse scholarly datasets.
Contribution
It provides a harmonized dataset, an open-source reference model, and an evaluation framework for fair comparison and improved performance in author name disambiguation.
Findings
Training on combined datasets improves model robustness.
The new model reduces error by over 50% compared to existing algorithms.
Evaluation across facets reveals performance disparities and fairness issues.
Abstract
Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analysis. While many AND algorithms have been proposed, comparing them is difficult because they often employ distinct features and are evaluated on different datasets. In response to this challenge, we present S2AND, a unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation. Our dataset harmonizes eight disparate AND datasets into a uniform format, with a single rich feature set drawn from the Semantic Scholar (S2) database. Our evaluation suite for S2AND reports performance split by facets like publication year and number of papers, allowing researchers to track both global performance and measures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Biomedical Text Mining and Ontologies
