Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds
Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic,, Chris Schwiegelshohn

TL;DR
This paper demonstrates that Sensitivity Sampling produces optimal coresets for $k$-means, especially on well-clusterable data, and extends these results to $k$-median and general metric spaces, improving efficiency and understanding.
Contribution
It proves that Sensitivity Sampling yields size-optimal coresets for worst-case and well-clusterable data, and extends these bounds to broader clustering problems and metric spaces.
Findings
Sensitivity Sampling achieves optimal coreset sizes for worst-case $k$-means.
For well-clusterable data, coresets are significantly smaller, size $ ilde{O}(k/ ext{epsilon}^2)$.
Coreset size lower bounds match the upper bounds for stable instances.
Abstract
Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as -means. Given a point set , a coreset is a small, weighted summary that preserves the cost of all candidate solutions up to a factor. For -means in -dimensional Euclidean space the cost for solution is . A very popular method for coreset construction, both in theory and practice, is Sensitivity Sampling, where points are sampled in proportion to their importance. We show that Sensitivity Sampling yields optimal coresets of size for worst-case instances. Uniquely among all known coreset algorithms, for well-clusterable data sets with cost stability, Sensitivity Sampling gives coresets of size ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Machine Learning and Algorithms
