Clustering with Noisy Queries
Arya Mazumdar, Barna Saha

TL;DR
This paper establishes fundamental limits and develops efficient algorithms for clustering with noisy pairwise queries, applicable to crowdsourcing, social networks, and stochastic block models, even when the number of clusters is unknown.
Contribution
It provides the first information-theoretic lower bounds and matching algorithms for clustering with noisy queries, addressing both adaptive and non-adaptive scenarios with unknown cluster counts.
Findings
Derived lower bounds on query complexity for noisy clustering.
Designed algorithms that match the lower bounds in various settings.
Extended the approach to applications like entity resolution and social network analysis.
Abstract
In this paper, we initiate a rigorous theoretical study of clustering with noisy queries (or a faulty oracle). Given a set of elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form : "do elements and belong to the same cluster?" -- the queries can be asked interactively (adaptive queries), or non-adaptively up-front, but its answer can be erroneous with probability . In this paper, we provide the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations. We design novel algorithms that closely match this query complexity lower bound, even when the number of clusters is unknown. Moreover, we design computationally efficient algorithms both for the adaptive and non-adaptive settings. The problem captures/generalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Complex Network Analysis Techniques · Advanced Clustering Algorithms Research
