Distortive Effects of Initial-Based Name Disambiguation on Measurements   of Large-Scale Coauthorship Networks

Jinseok Kim; Jana Diesner

arXiv:1502.06306·cs.DL·April 3, 2015

Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks

Jinseok Kim, Jana Diesner

PDF

TL;DR

This study investigates how initial-based name disambiguation methods distort the structure and statistics of large-scale coauthorship networks across multiple scientific fields, revealing significant inaccuracies and biases.

Contribution

It provides empirical evidence that initial-based disambiguation significantly biases network measurements and misidentifies key authors, challenging its validity for research.

Findings

01

Initial-based disambiguation inflates some network metrics like productivity and density.

02

It underestimates the number of unique authors and network components.

03

Asian names are particularly prone to misidentification.

Abstract

Scholars have often relied on name initials to resolve name ambiguities in large-scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial-based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests this assumption by analyzing coauthorship networks from five academic fields - biology, computer science, nanoscience, neuroscience, and physics - and an interdisciplinary journal, PNAS. Name instances in datasets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground-truth to test the performance of three types of initial-based disambiguation. Our results show that initial-based disambiguation can misrepresent statistical properties…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.