Whois? Deep Author Name Disambiguation using Bibliographic Data

Zeyd Boukhers; Nagaraj Asundi Bahubali

arXiv:2207.04772·cs.DL·July 26, 2022·1 cites

Whois? Deep Author Name Disambiguation using Bibliographic Data

Zeyd Boukhers, Nagaraj Asundi Bahubali

PDF

Open Access

TL;DR

This paper introduces a neural network-based method for author name disambiguation in bibliographic data, leveraging co-author networks and research domains to improve accuracy in digital libraries.

Contribution

It presents a novel approach that combines co-author and research area information with neural networks for author disambiguation.

Findings

01

Effective disambiguation on large bibliographic dataset

02

Improved accuracy over baseline methods

03

Scalable approach for digital libraries

Abstract

As the number of authors is increasing exponentially over years, the number of authors sharing the same names is increasing proportionally. This makes it challenging to assign newly published papers to their adequate authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use a collection from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, which is represented by the titles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Biomedical Text Mining and Ontologies · Topic Modeling