Clustering Via Crowdsourcing
Arya Mazumdar, Barna Saha

TL;DR
This paper explores efficient crowdsourcing algorithms for entity resolution, reducing query complexity using side information and handling noisy crowd responses, with theoretical bounds and parallelizable solutions.
Contribution
It introduces new information-theoretic bounds and algorithms that minimize queries for clustering with noisy crowd answers and side information.
Findings
Query complexity reduced to linear or sublinear in n
Algorithms are near-optimal and parallelizable
Bounds closely match theoretical limits
Abstract
In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations. Therefore, a crowd based algorithm must judiciously use any information computed through an automated process, and ask minimum number of questions to the crowd adaptively. One such problem which has received significant attention is {\em entity resolution}. Formally, we are given a graph with unknown edge set where is a union of (again unknown, but typically large , for ) disjoint cliques , . The goal is to retrieve the sets s by making minimum number of pair-wise queries to an oracle (the crowd). When the answer to each query is correct, e.g. via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Privacy-Preserving Technologies in Data · Data Quality and Management
