Polylogarithmic Sketches for Clustering
Moses Charikar, Erik Waingarten

TL;DR
This paper introduces polylogarithmic-sized sketches for efficiently estimating clustering costs in high-dimensional spaces, enabling streaming and distributed algorithms with sublinear dependence on dimension for p in [1,2].
Contribution
The authors develop novel sketches that approximate clustering costs in high-dimensional spaces with polylogarithmic size, applicable to streaming and distributed settings, for p in [1,2].
Findings
Sketch size is poly$( ext{log}(nd),k,1/ extepsilon)$.
Provides the first sublinear dependence on $d$ for $p ext{ in } [1,2)$ in streaming and distributed algorithms.
Achieves $(1+ extepsilon)$-approximation of clustering cost without recovering cluster centers.
Abstract
Given points in , we consider the problem of partitioning points into clusters with associated centers. The cost of a clustering is the sum of powers of distances of points to their cluster centers. For , we design sketches of size poly such that the cost of the optimal clustering can be estimated to within factor , despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly. We also obtain a distributed memory algorithm, where the points are arbitrarily partitioned amongst machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
