Approximate Clustering with Same-Cluster Queries
Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, Amit Kumar

TL;DR
This paper introduces a polynomial-time approximation algorithm for the $k$-means clustering problem using a limited number of same-cluster queries, removing the need for margin assumptions and providing bounds on query complexity.
Contribution
It extends semi-supervised clustering with same-cluster queries to achieve $(1 + ext{epsilon})$-approximation without margin assumptions, using a query complexity independent of dataset size.
Findings
Achieves $(1 + ext{epsilon})$-approximation for $k$-means with few queries
Provides a lower bound on query complexity under ETH
Modifies $k$-means++ to obtain constant-factor approximation
Abstract
Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?" There are many clustering contexts where such same-cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the -means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed same-cluster queries. This is interesting since the -means problem, even with the margin assumption, is -hard. In this paper, we extend the work of Ashtiani et al. to the approximation setting showing that a few of such same-cluster queries enables one to get a polynomial-time -approximation algorithm for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Graph Theory Research · Advanced Clustering Algorithms Research
