Streaming Balanced Clustering
Hossein Esfandiari, Vahab Mirrokni, Peilin Zhong

TL;DR
This paper introduces the first single-pass streaming algorithm for capacitated clustering problems, including k-median and k-means, that handles insertions and deletions with near-capacity constraints in high-dimensional Euclidean space.
Contribution
It presents a novel single-pass streaming algorithm for balanced clustering that handles both insertions and deletions, using space poly$(k d ext{log} \Delta)$ and a new space decomposition technique.
Findings
Developed a space decomposition via curved half-spaces.
Designed a strong coreset of size poly$(k d ext{log} \Delta)$.
Algorithm handles insertions and deletions with capacity violation only by a $1+ ext{epsilon}$ factor.
Abstract
Clustering of data points in metric space is among the most fundamental problems in computer science with plenty of applications in data mining, information retrieval and machine learning. Due to the necessity of clustering of large datasets, several streaming algorithms have been developed for different variants of clustering problems such as -median and -means problems. However, despite the importance of the context, the current understanding of balanced clustering (or more generally capacitated clustering) in the streaming setting is very limited. The only previously known streaming approximation algorithm for capacitated clustering requires three passes and only handles insertions. In this work, we develop \emph{the first single pass streaming algorithm} for a general class of clustering problems that includes capacitated -median and capacitated -means in Euclidean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFacility Location and Emergency Management · Data Management and Algorithms · Computational Geometry and Mesh Generation
