K*-Means: A Parameter-free Clustering Algorithm
Louis Mahon, Mirella Lapata

TL;DR
K*-means is a new clustering algorithm that automatically determines the optimal number of clusters without requiring prior parameter setting, using the minimum description length principle.
Contribution
It introduces a parameter-free clustering method that combines splitting and merging with k-means, guaranteeing convergence and effective estimation of k.
Findings
Outperforms existing methods when k is unknown
Accurately estimates the number of clusters
Scales well with dataset size and has competitive runtime
Abstract
Clustering is a widely used and powerful machine learning technique, but its effectiveness is often limited by the need to specify the number of clusters, k, or by relying on thresholds that implicitly determine k. We introduce k*-means, a novel clustering algorithm that eliminates the need to set k or any other parameters. Instead, it uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters while also optimising the standard k-means objective. We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown. We also show that it is accurate in estimating k, and that empirically its runtime is competitive with existing methods, and scales well with dataset size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Stochastic Gradient Optimization Techniques
MethodsSparse Evolutionary Training
