How the initialization affects the stability of the k-means algorithm
Sebastien Bubeck, Marina Meila, Ulrike von Luxburg

TL;DR
This paper examines how different initialization methods influence the stability and local optima of the k-means clustering algorithm, emphasizing the importance of initialization in practical clustering outcomes.
Contribution
It provides a detailed analysis of the impact of initialization on k-means stability, considering actual algorithm behavior and local optima, unlike prior studies.
Findings
Different initializations can lead to the same or different local optima.
Stability scores are justified for selecting the number of clusters.
The actual k-means algorithm's properties are crucial for understanding clustering stability.
Abstract
We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition
