Assessing the level of merging errors for coauthorship data: a Bayesian   model

Zheng Xie

arXiv:1711.01406·physics.soc-ph·December 27, 2018

Assessing the level of merging errors for coauthorship data: a Bayesian model

Zheng Xie

PDF

Open Access

TL;DR

This paper introduces a Bayesian model to estimate merging errors in coauthorship data, aiding in identifying and reducing data inaccuracies to improve the quality of coauthorship network analyses.

Contribution

The paper presents a novel Bayesian approach to quantify merging errors using author features, enhancing data quality assessment in coauthorship networks.

Findings

01

The model effectively estimates merging error rates.

02

It identifies features that indicate heavily compromised entities.

03

Potential to improve empirical data quality.

Abstract

Robust analysis of coauthorship networks is based on high quality data. However, ground-truth data are usually unavailable. Empirical data suffer several types of errors, a typical one of which is called merging error, identifying different persons as one entity. Specific features of authors have been used to reduce these errors. We proposed a Bayesian model to calculate the information of any given features of authors. Based on the features, the model can be utilized to calculate the rate of merging errors for entities. Therefore, the model helps to find informative features for detecting heavily compromised entities. It has potential contributions to improving the quality of empirical data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Data Mining Algorithms and Applications