Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal
Lingxiao Huang, Nisheeth K. Vishnoi

TL;DR
This paper introduces a nearly optimal importance sampling framework for constructing small coresets for Euclidean clustering problems, significantly improving previous bounds and providing new dimensionality reduction techniques.
Contribution
The paper presents a unified two-stage importance sampling method for coresets in Euclidean clustering, reducing coreset size and computational complexity compared to prior work.
Findings
Coreset size for k-median is O(^{-4} k)
New dimensionality reduction connects subspace approximation and clustering
Lower bound matches the upper bound in size dependence
Abstract
Given a collection of points in , the goal of the -clustering problem is to find a subset of "centers" that minimizes the sum of the -th powers of the Euclidean distance of each point to the closest center. Special cases of the -clustering problem include the -median and -means problems. Our main result is a unified two-stage importance sampling framework that constructs an -coreset for the -clustering problem. Compared to the results for -clustering in [Feldman and Langberg, STOC 2011], our framework saves a factor in the coreset size. Compared to the results for -clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a factor in the coreset size and avoids the term in the construction time. Specifically, our coreset for -median…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Computational Geometry and Mesh Generation · Automated Road and Building Extraction
