Greedy bi-criteria approximations for $k$-medians and $k$-means
Daniel Hsu, Matus Telgarsky

TL;DR
This paper presents greedy bi-criteria approximation algorithms for $k$-medians and $k$-means clustering, achieving near-optimal costs with fewer centers through sampling and candidate set strategies.
Contribution
It introduces novel greedy algorithms with provable approximation guarantees for $k$-medians and $k$-means, including sampling methods and candidate set constructions.
Findings
Achieves $2+ ext{epsilon}$ approximation with $O(k ext{log}(1/epsilon))$ centers.
Provides $1+ ext{epsilon}$ approximation using large candidate sets and stochastic gradient descent.
Includes empirical evaluation showing favorable results against $k$-means++.
Abstract
This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of -medians and -means, the key results are as follows. When the method considers all data points as candidate centers, then selecting centers achieves cost at most times the optimal cost with centers. Alternatively, the same guarantees hold if each round samples candidate centers proportionally to their cluster cost (as with , but holding centers fixed). In the case of -means, considering an augmented set of candidate centers gives approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
