How the initialization affects the stability of the k-means algorithm

Sebastien Bubeck; Marina Meila; Ulrike von Luxburg

arXiv:0907.5494·stat.ML·August 3, 2009·5 cites

How the initialization affects the stability of the k-means algorithm

Sebastien Bubeck, Marina Meila, Ulrike von Luxburg

PDF

Open Access

TL;DR

This paper examines how different initialization methods influence the stability and local optima of the k-means clustering algorithm, emphasizing the importance of initialization in practical clustering outcomes.

Contribution

It provides a detailed analysis of the impact of initialization on k-means stability, considering actual algorithm behavior and local optima, unlike prior studies.

Findings

01

Different initializations can lead to the same or different local optima.

02

Stability scores are justified for selecting the number of clusters.

03

The actual k-means algorithm's properties are crucial for understanding clustering stability.

Abstract

We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition