Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle
Pan Peng, Jiapeng Zhang

TL;DR
This paper introduces a nearly query-optimal, time-efficient clustering algorithm for multiple clusters with a faulty oracle, advancing the theoretical understanding of clustering under noisy conditions.
Contribution
It presents a new algorithm that achieves near-optimal query complexity and polynomial runtime for clustering with a faulty oracle across all constant cluster counts.
Findings
Achieves nearly optimal query complexity up to a logarithmic factor.
Operates efficiently in polynomial time for all constant number of clusters.
Extends previous work to general k clusters and various bias regimes.
Abstract
Motivated by applications in crowdsourced entity resolution in database, signed edge prediction in social networks and correlation clustering, Mazumdar and Saha [NIPS 2017] proposed an elegant theoretical model for studying clustering with a faulty oracle. In this model, given a set of items which belong to unknown groups (or clusters), our goal is to recover the clusters by asking pairwise queries to an oracle. This oracle can answer the query that ``do items and belong to the same cluster?''. However, the answer to each pairwise query errs with probability , for some . Mazumdar and Saha provided two algorithms under this model: one algorithm is query-optimal while time-inefficient (i.e., running in quasi-polynomial time), the other is time efficient (i.e., in polynomial time) while query-suboptimal. Larsen, Mitzenmacher and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Mobile Crowdsensing and Crowdsourcing
