Coresets for $k$-Means and $k$-Median Clustering and their Applications
Sariel Har-Peled, Soham Mazumdar

TL;DR
This paper introduces small coresets for low-dimensional $k$-median and $k$-means clustering, enabling faster approximation algorithms and efficient streaming updates with minimal space and time overhead.
Contribution
The paper proves the existence of small coresets for $k$-median and $k$-means in low dimensions, improving algorithmic efficiency and streaming capabilities.
Findings
Coresets of size $O(k \\varepsilon^{-d} \\log n)$ exist for low-dimensional data.
Faster $(1+\\varepsilon)$-approximate clustering algorithms are developed.
Streaming algorithms with polylogarithmic space and update time are proposed.
Abstract
In this paper, we show the existence of small coresets for the problems of computing -median and -means clustering for points in low dimension. In other words, we show that given a point set in , one can compute a weighted set , of size , such that one can compute the -median/means clustering on instead of on , and get an -approximation. As a result, we improve the fastest known algorithms for -approximate -means and -median clustering. Our algorithms have linear running time for a fixed and . In addition, we can maintain the -approximate -median or -means clustering of a stream when points are being only inserted, using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Data Management and Algorithms · Stochastic Gradient Optimization Techniques
