Degrees of Freedom and Model Selection for k-means Clustering
David P. Hofmeyr

TL;DR
This paper develops a new way to measure the effective degrees of freedom in k-means clustering, enabling better model selection through BIC, validated on simulated and real datasets.
Contribution
It introduces an extension of Stein's lemma to approximate the degrees of freedom in k-means, improving model selection accuracy.
Findings
Proposed degrees of freedom measure aligns well with empirical results.
Method outperforms existing techniques in selecting high-quality clusters.
Code implementation is available as an R package.
Abstract
This paper investigates the model degrees of freedom in k-means clustering. An extension of Stein's lemma provides an expression for the effective degrees of freedom in the k-means model. Approximating the degrees of freedom in practice requires simplifications of this expression, however empirical studies evince the appropriateness of our proposed approach. The practical relevance of this new degrees of freedom formulation for k-means is demonstrated through model selection using the Bayesian Information Criterion. The reliability of this method is validated through experiments on simulated data as well as on a large collection of publicly available benchmark data sets from diverse application areas. Comparisons with popular existing techniques indicate that this approach is extremely competitive for selecting high quality clustering solutions. Code to implement the proposed approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Mining Algorithms and Applications
