A Nearly Tight Analysis of Greedy k-means++

Christoph Grunau; Ahmet Alper \"Oz\"udo\u{g}ru; V\'aclav Rozho\v{n},; Jakub T\v{e}tek

arXiv:2207.07949·cs.DS·July 19, 2022

A Nearly Tight Analysis of Greedy k-means++

Christoph Grunau, Ahmet Alper \"Oz\"udo\u{g}ru, V\'aclav Rozho\v{n},, Jakub T\v{e}tek

PDF

Open Access

TL;DR

This paper provides nearly tight bounds on the greedy k-means++ algorithm, showing it achieves an approximation ratio of roughly O(ell^3 log^3 k) with matching lower bounds, advancing understanding of its theoretical guarantees.

Contribution

The paper establishes the first nearly matching upper and lower bounds for the greedy k-means++ algorithm's approximation ratio, improving prior bounds significantly.

Findings

01

Upper bound of O(ell^3 log^3 k) on approximation ratio

02

Lower bound of Omega(ell^3 log^3 k / log^2(ell log k))

03

Advances theoretical understanding of greedy k-means++ guarantees

Abstract

The famous $k$ -means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the $k$ -means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following $k - 1$ centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The $k$ -means++ algorithm is known to return a $Θ (lo g k)$ approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample $ℓ$ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how $k$ -means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Sparse and Compressive Sensing Techniques

MethodsLib