Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation
Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

TL;DR
This paper develops deterministic clustering sketches and algorithms in high-dimensional spaces, matching the efficiency of randomized methods and advancing the understanding of deterministic approaches in clustering problems.
Contribution
It introduces near-optimal deterministic sketches and a $(1+ ext{varepsilon})$-approximation algorithm for high-dimensional $k$-median and $k$-means clustering.
Findings
Deterministic sketches have size bounds close to randomized ones.
A deterministic $(1+ ext{varepsilon})$-approximation algorithm runs in near-optimal time.
A new randomized coreset improves previous results by a factor of $k$.
Abstract
In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic -median and -means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension , the precision parameter or . Furthermore, there is no coreset construction that succeeds with probability and whose size does not depend on the number of input points, . This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are for both -median and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
