Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data
Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong

TL;DR
This paper introduces a nearly optimal dynamic streaming algorithm for high-dimensional $k$-means clustering, efficiently maintaining coresets with space polynomial in dimension and linear in the number of clusters.
Contribution
It presents the first dynamic geometric data stream algorithm for $k$-means with space polynomial in dimension and nearly linear in $k$, advancing streaming clustering methods.
Findings
Achieves one-pass coreset construction in dynamic streams.
Uses space $ ilde{O}(k ext{poly}(d, ext{log}\Delta))$, nearly optimal in $k$.
First such algorithm with polynomial space in dimension for $k$-means.
Abstract
We consider the -means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space can be dynamically inserted to or deleted from the dataset. For this problem, we provide a one-pass coreset construction algorithm using space , where is the target number of centers. To our knowledge, this is the first dynamic geometric data stream algorithm for -means using space polynomial in dimension and nearly optimal (linear) in .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Topological and Geometric Data Analysis
