A bad 2-dimensional instance for k-means++

Ragesh Jaiswal; Prachi Jain; Saumya Yadav

arXiv:1306.4207·cs.DS·June 19, 2013

A bad 2-dimensional instance for k-means++

Ragesh Jaiswal, Prachi Jain, Saumya Yadav

PDF

Open Access

TL;DR

This paper constructs a simple 2D dataset demonstrating that the k-means++ seeding algorithm can have a very low probability of achieving a good approximation ratio, highlighting limitations in low-dimensional cases.

Contribution

It provides the first low-dimensional example where k-means++ has exponentially small probability of good approximation, addressing open problems in understanding its behavior.

Findings

01

Demonstrates a 2D dataset with exponentially small probability of good approximation.

02

Shows limitations of k-means++ in low-dimensional settings.

03

Addresses open theoretical questions about the algorithm's probabilistic guarantees.

Abstract

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: {quote} Pick the first center randomly from among the given points. For $i > 1$ , pick a point to be the $i^{t h}$ center with probability proportional to the square of the Euclidean distance of this point to the previously $(i - 1)$ chosen centers. {quote} The k-means++ seeding algorithm is not only simple and fast but gives an $O (lo g k)$ approximation in expectation as shown by Arthur and Vassilvitskii \cite{av07}. There are datasets \cite{av07,adk09} on which this seeding algorithm gives an approximation factor $Ω (lo g k)$ in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Data Management and Algorithms