Streaming Euclidean $k$-median and $k$-means with $o(\log n)$ Space
Vincent Cohen-Addad, David P. Woodruff, Samson Zhou

TL;DR
This paper presents a novel streaming algorithm for Euclidean $k$-median and $k$-means clustering that uses sub-logarithmic space, breaking previous memory barriers and achieving near-optimal approximation guarantees.
Contribution
It introduces the first insertion-only streaming algorithm for $(k,z)$-clustering with sub-logarithmic memory, surpassing the $ ext{O}( ext{log } n)$ barrier of prior techniques.
Findings
Achieves $(1+ ext{epsilon})$-approximation with sub-logarithmic memory.
Supports two-pass algorithms for dynamic streams.
Breaks the longstanding $ ext{O}( ext{log } n)$ memory barrier.
Abstract
We consider the classic Euclidean -median and -means objective on data streams, where the goal is to provide a -approximation to the optimal -median or -means solution, while using as little memory as possible. Over the last 20 years, clustering in data streams has received a tremendous amount of attention and has been the test-bed for a large variety of new techniques, including coresets, the merge-and-reduce framework, bicriteria approximation, sensitivity sampling, and so on. Despite this intense effort to obtain smaller sketches for these problems, all known techniques require storing at least words of memory, where is the size of the input and is the aspect ratio. A natural question is if one can beat this logarithmic dependence on and . In this paper, we break this barrier by first giving an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Streaming Euclidean k-median and k-means with o(log n) Space· youtube
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Anomaly Detection Techniques and Applications
