Greedy bi-criteria approximations for $k$-medians and $k$-means

Daniel Hsu; Matus Telgarsky

arXiv:1607.06203·cs.DS·July 22, 2016·5 cites

Greedy bi-criteria approximations for $k$-medians and $k$-means

Daniel Hsu, Matus Telgarsky

PDF

Open Access

TL;DR

This paper presents greedy bi-criteria approximation algorithms for $k$-medians and $k$-means clustering, achieving near-optimal costs with fewer centers through sampling and candidate set strategies.

Contribution

It introduces novel greedy algorithms with provable approximation guarantees for $k$-medians and $k$-means, including sampling methods and candidate set constructions.

Findings

01

Achieves $2+ ext{epsilon}$ approximation with $O(k ext{log}(1/epsilon))$ centers.

02

Provides $1+ ext{epsilon}$ approximation using large candidate sets and stochastic gradient descent.

03

Includes empirical evaluation showing favorable results against $k$-means++.

Abstract

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of $k$ -medians and $k$ -means, the key results are as follows. $∙$ When the method considers all data points as candidate centers, then selecting $O (k lo g (1/ ε))$ centers achieves cost at most $2 + ε$ times the optimal cost with $k$ centers. $∙$ Alternatively, the same guarantees hold if each round samples $O (k / ε^{5})$ candidate centers proportionally to their cluster cost (as with $kmeans++$ , but holding centers fixed). $∙$ In the case of $k$ -means, considering an augmented set of $n^{⌈ 1/ ε ⌉}$ candidate centers gives $1 + ε$ approximation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques