A bad 2-dimensional instance for k-means++
Ragesh Jaiswal, Prachi Jain, Saumya Yadav

TL;DR
This paper constructs a simple 2D dataset demonstrating that the k-means++ seeding algorithm can have a very low probability of achieving a good approximation ratio, highlighting limitations in low-dimensional cases.
Contribution
It provides the first low-dimensional example where k-means++ has exponentially small probability of good approximation, addressing open problems in understanding its behavior.
Findings
Demonstrates a 2D dataset with exponentially small probability of good approximation.
Shows limitations of k-means++ in low-dimensional settings.
Addresses open theoretical questions about the algorithm's probabilistic guarantees.
Abstract
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: {quote} Pick the first center randomly from among the given points. For , pick a point to be the center with probability proportional to the square of the Euclidean distance of this point to the previously chosen centers. {quote} The k-means++ seeding algorithm is not only simple and fast but gives an approximation in expectation as shown by Arthur and Vassilvitskii \cite{av07}. There are datasets \cite{av07,adk09} on which this seeding algorithm gives an approximation factor in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Data Management and Algorithms
