Assessing the level of merging errors for coauthorship data: a Bayesian model
Zheng Xie

TL;DR
This paper introduces a Bayesian model to estimate merging errors in coauthorship data, aiding in identifying and reducing data inaccuracies to improve the quality of coauthorship network analyses.
Contribution
The paper presents a novel Bayesian approach to quantify merging errors using author features, enhancing data quality assessment in coauthorship networks.
Findings
The model effectively estimates merging error rates.
It identifies features that indicate heavily compromised entities.
Potential to improve empirical data quality.
Abstract
Robust analysis of coauthorship networks is based on high quality data. However, ground-truth data are usually unavailable. Empirical data suffer several types of errors, a typical one of which is called merging error, identifying different persons as one entity. Specific features of authors have been used to reduce these errors. We proposed a Bayesian model to calculate the information of any given features of authors. Based on the features, the model can be utilized to calculate the rate of merging errors for entities. Therefore, the model helps to find informative features for detecting heavily compromised entities. It has potential contributions to improving the quality of empirical data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications
