Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal   Coreset Bounds

Nikhil Bansal; Vincent Cohen-Addad; Milind Prabhu; David Saulpic,; Chris Schwiegelshohn

arXiv:2405.01339·cs.DS·May 3, 2024

Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds

Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic,, Chris Schwiegelshohn

PDF

Open Access

TL;DR

This paper demonstrates that Sensitivity Sampling produces optimal coresets for $k$-means, especially on well-clusterable data, and extends these results to $k$-median and general metric spaces, improving efficiency and understanding.

Contribution

It proves that Sensitivity Sampling yields size-optimal coresets for worst-case and well-clusterable data, and extends these bounds to broader clustering problems and metric spaces.

Findings

01

Sensitivity Sampling achieves optimal coreset sizes for worst-case $k$-means.

02

For well-clusterable data, coresets are significantly smaller, size $ ilde{O}(k/ ext{epsilon}^2)$.

03

Coreset size lower bounds match the upper bounds for stable instances.

Abstract

Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$ -means. Given a point set $P$ , a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1 \pm ε)$ factor. For $k$ -means in $d$ -dimensional Euclidean space the cost for solution $S$ is $\sum_{p \in P} min_{s \in S} ∥ p - s ∥^{2}$ . A very popular method for coreset construction, both in theory and practice, is Sensitivity Sampling, where points are sampled in proportion to their importance. We show that Sensitivity Sampling yields optimal coresets of size $\tilde{O} (k / ε^{2} min (k, ε^{- 2}))$ for worst-case instances. Uniquely among all known coreset algorithms, for well-clusterable data sets with $Ω (1)$ cost stability, Sensitivity Sampling gives coresets of size $\tilde{O} (k / ε^{2})$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Machine Learning and Algorithms