The Power of Uniform Sampling for Coresets
Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H.-C. Jiang, Robert, Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, Xuan Wu

TL;DR
This paper introduces a meta-theorem that enables the construction of smaller, uniform-sampling-based coresets for various constrained clustering problems, often independent of input size, improving efficiency and applicability.
Contribution
The authors develop a meta-theorem allowing coresets for constrained clustering to be built via uniform sampling, simplifying previous importance sampling methods and achieving size independence from data size.
Findings
Coresets for multiple constrained clustering problems are smaller and sometimes the first of their kind.
Uniform sampling can produce coresets with size independent of the number of input points.
New bounds for 1-median coresets in low-dimensional Euclidean spaces.
Abstract
Motivated by practical generalizations of the classic -median and -means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of , the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
