Number of Clusters in a Dataset: A Regularized K-means Approach

Behzad Kamgar-Parsi; Behrooz Kamgar-Parsi

arXiv:2505.22991·cs.LG·May 30, 2025

Number of Clusters in a Dataset: A Regularized K-means Approach

Behzad Kamgar-Parsi, Behrooz Kamgar-Parsi

PDF

Open Access

TL;DR

This paper provides theoretical bounds for the regularization parameter in k-means clustering, compares additive and multiplicative regularizers, and evaluates their effectiveness in identifying meaningful clusters in datasets.

Contribution

It introduces rigorous bounds for the regularization parameter under ideal cluster assumptions and analyzes the behavior of additive and multiplicative regularizers in k-means clustering.

Findings

01

Regularization bounds are derived for ideal clusters.

02

Additive and multiplicative regularizers can reduce solution ambiguity.

03

Experiments show effectiveness when clusters deviate from ideal assumptions.

Abstract

Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most common formulation of the regularization function is the additive linear term $λk$ , where $k$ is the number of clusters and $λ$ a positive coefficient. Currently, there are no principled guidelines for setting a value for the critical hyperparameter $λ$ . In this paper, we derive rigorous bounds for $λ$ assuming clusters are {\em ideal}. Ideal clusters (defined as $d$ -dimensional spheres with identical radii) are close proxies for k-means clusters ( $d$ -dimensional spherically symmetric distributions with identical standard deviations). Experiments show that the k-means algorithm with additive regularizer often yields multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Stochastic Gradient Optimization Techniques · Bayesian Methods and Mixture Models