An Algorithm for Online K-Means Clustering

Edo Liberty; Ram Sriharsha; Maxim Sviridenko

arXiv:1412.5721·cs.DS·February 24, 2015·5 cites

An Algorithm for Online K-Means Clustering

Edo Liberty, Ram Sriharsha, Maxim Sviridenko

PDF

Open Access

TL;DR

This paper introduces an online algorithm for k-means clustering that achieves competitive clustering costs with fewer clusters, operating efficiently in an online setting and performing comparably to k-means++ in experiments.

Contribution

The paper presents a novel online k-means clustering algorithm with theoretical guarantees and practical efficiency, improving upon previous methods in an online context.

Findings

01

Achieves ~O(k) clusters with ~O(W*) cost, where W* is the optimal cost.

02

Performs comparably to k-means++ in experiments.

03

Operates efficiently in a strictly online computational model.

Abstract

This paper shows that one can be competitive with the k-means objective while operating online. In this model, the algorithm receives vectors v_1,...,v_n one by one in an arbitrary order. For each vector the algorithm outputs a cluster identifier before receiving the next one. Our online algorithm generates ~O(k) clusters whose k-means cost is ~O(W*). Here, W* is the optimal k-means cost using k clusters and ~O suppresses poly-logarithmic factors. We also show that, experimentally, it is not much worse than k-means++ while operating in a strictly more constrained computational model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Advanced Clustering Algorithms Research · Data Management and Algorithms