K*-Means: A Parameter-free Clustering Algorithm

Louis Mahon; Mirella Lapata

arXiv:2505.11904·cs.LG·May 20, 2025

K*-Means: A Parameter-free Clustering Algorithm

Louis Mahon, Mirella Lapata

PDF

Open Access

TL;DR

K*-means is a new clustering algorithm that automatically determines the optimal number of clusters without requiring prior parameter setting, using the minimum description length principle.

Contribution

It introduces a parameter-free clustering method that combines splitting and merging with k-means, guaranteeing convergence and effective estimation of k.

Findings

01

Outperforms existing methods when k is unknown

02

Accurately estimates the number of clusters

03

Scales well with dataset size and has competitive runtime

Abstract

Clustering is a widely used and powerful machine learning technique, but its effectiveness is often limited by the need to specify the number of clusters, k, or by relying on thresholds that implicitly determine k. We introduce k*-means, a novel clustering algorithm that eliminates the need to set k or any other parameters. Instead, it uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters while also optimising the standard k-means objective. We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown. We also show that it is accurate in estimating k, and that empirically its runtime is competitive with existing methods, and scales well with dataset size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Stochastic Gradient Optimization Techniques

MethodsSparse Evolutionary Training