Deterministic Clustering in High Dimensional Spaces: Sketches and   Approximation

Vincent Cohen-Addad; David Saulpic; Chris Schwiegelshohn

arXiv:2310.04076·cs.DS·October 9, 2023

Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

PDF

Open Access

TL;DR

This paper develops deterministic clustering sketches and algorithms in high-dimensional spaces, matching the efficiency of randomized methods and advancing the understanding of deterministic approaches in clustering problems.

Contribution

It introduces near-optimal deterministic sketches and a $(1+ ext{varepsilon})$-approximation algorithm for high-dimensional $k$-median and $k$-means clustering.

Findings

01

Deterministic sketches have size bounds close to randomized ones.

02

A deterministic $(1+ ext{varepsilon})$-approximation algorithm runs in near-optimal time.

03

A new randomized coreset improves previous results by a factor of $k$.

Abstract

In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$ -median and $k$ -means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension $d$ , the precision parameter $ε^{- 1}$ or $k$ . Furthermore, there is no coreset construction that succeeds with probability $1 - 1/ n$ and whose size does not depend on the number of input points, $n$ . This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are $Ω (1)$ for both $k$ -median and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research