A Nearly Tight Analysis of Greedy k-means++
Christoph Grunau, Ahmet Alper \"Oz\"udo\u{g}ru, V\'aclav Rozho\v{n},, Jakub T\v{e}tek

TL;DR
This paper provides nearly tight bounds on the greedy k-means++ algorithm, showing it achieves an approximation ratio of roughly O(ell^3 log^3 k) with matching lower bounds, advancing understanding of its theoretical guarantees.
Contribution
The paper establishes the first nearly matching upper and lower bounds for the greedy k-means++ algorithm's approximation ratio, improving prior bounds significantly.
Findings
Upper bound of O(ell^3 log^3 k) on approximation ratio
Lower bound of Omega(ell^3 log^3 k / log^2(ell log k))
Advances theoretical understanding of greedy k-means++ guarantees
Abstract
The famous -means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the -means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The -means++ algorithm is known to return a approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample candidate centers instead of one and then pick the one that minimizes the new cost. This is also how -means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Sparse and Compressive Sensing Techniques
MethodsLib
