Semi-Supervised Algorithms for Approximately Optimal and Accurate   Clustering

Buddhima Gamlath; Sangxia Huang; Ola Svensson

arXiv:1803.00926·cs.DS·November 7, 2018

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

Buddhima Gamlath, Sangxia Huang, Ola Svensson

PDF

TL;DR

This paper develops semi-supervised algorithms for $k$-means clustering that efficiently approximate the optimal clustering with high accuracy using a limited number of oracle queries, applicable in Euclidean and finite metric spaces.

Contribution

It introduces new query-efficient algorithms for semi-supervised $k$-means clustering with theoretical guarantees on cost and accuracy, applicable in various metric spaces.

Findings

01

Query complexity depends on the number of clusters, dimension, and candidate centers.

02

Algorithms achieve near-optimal clustering with high probability.

03

Lower bounds match the upper bounds on query complexity.

Abstract

We study $k$ -means clustering in a semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we investigate the following question: how many oracle queries are sufficient to efficiently recover a clustering that, with probability at least $(1 - δ)$ , simultaneously has a cost of at most $(1 + ϵ)$ times the optimal cost and an accuracy of at least $(1 - ϵ)$ ? We show how to achieve such a clustering on $n$ points with $O ((k^{2} lo g n) \cdot m (Q, ϵ^{4}, δ / (k lo g n)))$ oracle queries, when the $k$ clusters can be learned with an $ϵ^{'}$ error and a failure probability $δ^{'}$ using $m (Q, ϵ^{'}, δ^{'})$ labeled samples in the supervised setting, where $Q$ is the set of candidate cluster centers. We show that $m (Q, ϵ^{'}, δ^{'})$ is small both for $k$ -means…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.