CrowdER: Crowdsourcing Entity Resolution
Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng

TL;DR
CrowdER introduces a hybrid human-machine approach for entity resolution that reduces verification tasks and improves accuracy, leveraging initial machine filtering and targeted human verification.
Contribution
The paper presents a novel two-tiered heuristic for batching verification tasks in a hybrid entity resolution system, optimizing efficiency and accuracy.
Findings
Hybrid approach outperforms machine-only and human-only methods
Efficient batching reduces verification tasks significantly
High accuracy achieved with fewer human verifications
Abstract
Entity resolution is central to data integration and data cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate but expensive (and slow) way to bring human insight into the process. Previous work has proposed batching verification tasks for presentation to human workers but even with batching, a human-only approach is infeasible for data sets of even moderate size, due to the large numbers of matches to be tested. Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. We show that for such a hybrid system, generating the minimum number of verification tasks of a given size is NP-Hard, but we develop a novel two-tiered heuristic approach for creating batched tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Mobile Crowdsensing and Crowdsourcing · Privacy-Preserving Technologies in Data
