An Algorithm for Online K-Means Clustering
Edo Liberty, Ram Sriharsha, Maxim Sviridenko

TL;DR
This paper introduces an online algorithm for k-means clustering that achieves competitive clustering costs with fewer clusters, operating efficiently in an online setting and performing comparably to k-means++ in experiments.
Contribution
The paper presents a novel online k-means clustering algorithm with theoretical guarantees and practical efficiency, improving upon previous methods in an online context.
Findings
Achieves ~O(k) clusters with ~O(W*) cost, where W* is the optimal cost.
Performs comparably to k-means++ in experiments.
Operates efficiently in a strictly online computational model.
Abstract
This paper shows that one can be competitive with the k-means objective while operating online. In this model, the algorithm receives vectors v_1,...,v_n one by one in an arbitrary order. For each vector the algorithm outputs a cluster identifier before receiving the next one. Our online algorithm generates ~O(k) clusters whose k-means cost is ~O(W*). Here, W* is the optimal k-means cost using k clusters and ~O suppresses poly-logarithmic factors. We also show that, experimentally, it is not much worse than k-means++ while operating in a strictly more constrained computational model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Advanced Clustering Algorithms Research · Data Management and Algorithms
