Improved seeding strategies for k-means and k-GMM

Guillaume Carri\`ere; Fr\'ed\'eric Cazals

arXiv:2506.21291·cs.LG·November 4, 2025

Improved seeding strategies for k-means and k-GMM

Guillaume Carri\`ere, Fr\'ed\'eric Cazals

PDF

Open Access

TL;DR

This paper introduces improved randomized seeding strategies for k-means and k-GMM clustering, demonstrating consistent performance gains and providing new insights into seeding properties and analysis.

Contribution

It formalizes key aspects of seeding methods, proposes novel lookahead and multipass strategies, and shows their effectiveness over classical approaches.

Findings

01

Consistent improvement over classical seeding methods in final clustering metrics.

02

Insights into the relationship between initial seeding and final SSE.

03

Reduction in variance and sensitivity in iterative seeding methods.

Abstract

We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of candidate seeds, and the metric used for seed selection. This analysis yields novel families of initialization methods exploiting a lookahead principle--conditioning the seed selection to an enhanced coherence with the final metric used to assess the algorithm, and a multipass strategy to tame down the effect of randomization. Experiments show a consistent constant factor improvement over classical contenders in terms of the final metric (SSE for k-means, log-likelihood for k-GMM), at a modest overhead. In particular, for k-means, our methods improve on the recently designed multi-swap strategy, which was the first one to outperform the greedy k-means++…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Algorithms and Applications · Advanced Measurement and Detection Methods

Methodsk-Means Clustering