Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering
Buddhima Gamlath, Sangxia Huang, Ola Svensson

TL;DR
This paper develops semi-supervised algorithms for $k$-means clustering that efficiently approximate the optimal clustering with high accuracy using a limited number of oracle queries, applicable in Euclidean and finite metric spaces.
Contribution
It introduces new query-efficient algorithms for semi-supervised $k$-means clustering with theoretical guarantees on cost and accuracy, applicable in various metric spaces.
Findings
Query complexity depends on the number of clusters, dimension, and candidate centers.
Algorithms achieve near-optimal clustering with high probability.
Lower bounds match the upper bounds on query complexity.
Abstract
We study -means clustering in a semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we investigate the following question: how many oracle queries are sufficient to efficiently recover a clustering that, with probability at least , simultaneously has a cost of at most times the optimal cost and an accuracy of at least ? We show how to achieve such a clustering on points with oracle queries, when the clusters can be learned with an error and a failure probability using labeled samples in the supervised setting, where is the set of candidate cluster centers. We show that is small both for -means…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
