Crowdsourced Collective Entity Resolution with Relational Match Propagation
Jiacheng Huang, Wei Hu, Zhifeng Bao, Yuzhong Qu

TL;DR
This paper introduces a crowdsourced collective entity resolution method that leverages entity relationships and probabilistic propagation to improve accuracy and reduce labeling costs in knowledge bases.
Contribution
It proposes a novel collective ER approach that uses relational match propagation and addresses key challenges like candidate pruning and error tolerance, outperforming existing methods.
Findings
Achieves higher accuracy with less labeling than state-of-the-art methods.
Effectively propagates labeling information through entity relationships.
Reduces labor costs in crowdsourced entity resolution.
Abstract
Knowledge bases (KBs) store rich yet heterogeneous entities and facts. Entity resolution (ER) aims to identify entities in KBs which refer to the same real-world object. Recent studies have shown significant benefits of involving humans in the loop of ER. They often resolve entities with pairwise similarity measures over attribute values and resort to the crowds to label uncertain ones. However, existing methods still suffer from high labor costs and insufficient labeling to some extent. In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently. Specifically, it iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance. During this process, we address the problems of candidate entity pruning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data-Driven Disease Surveillance
