Coresets for Clustering in Euclidean Spaces: Importance Sampling is   Nearly Optimal

Lingxiao Huang; Nisheeth K. Vishnoi

arXiv:2004.06263·cs.CG·May 15, 2020·5 cites

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Lingxiao Huang, Nisheeth K. Vishnoi

PDF

Open Access

TL;DR

This paper introduces a nearly optimal importance sampling framework for constructing small coresets for Euclidean clustering problems, significantly improving previous bounds and providing new dimensionality reduction techniques.

Contribution

The paper presents a unified two-stage importance sampling method for coresets in Euclidean clustering, reducing coreset size and computational complexity compared to prior work.

Findings

01

Coreset size for k-median is O(^{-4} k)

02

New dimensionality reduction connects subspace approximation and clustering

03

Lower bound matches the upper bound in size dependence

Abstract

Given a collection of $n$ points in $R^{d}$ , the goal of the $(k, z)$ -clustering problem is to find a subset of $k$ "centers" that minimizes the sum of the $z$ -th powers of the Euclidean distance of each point to the closest center. Special cases of the $(k, z)$ -clustering problem include the $k$ -median and $k$ -means problems. Our main result is a unified two-stage importance sampling framework that constructs an $ε$ -coreset for the $(k, z)$ -clustering problem. Compared to the results for $(k, z)$ -clustering in [Feldman and Langberg, STOC 2011], our framework saves a $ε^{2} d$ factor in the coreset size. Compared to the results for $(k, z)$ -clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a $poly (k)$ factor in the coreset size and avoids the $exp (k / ε)$ term in the construction time. Specifically, our coreset for $k$ -median…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Computational Geometry and Mesh Generation · Automated Road and Building Extraction