Number of Clusters in a Dataset: A Regularized K-means Approach
Behzad Kamgar-Parsi, Behrooz Kamgar-Parsi

TL;DR
This paper provides theoretical bounds for the regularization parameter in k-means clustering, compares additive and multiplicative regularizers, and evaluates their effectiveness in identifying meaningful clusters in datasets.
Contribution
It introduces rigorous bounds for the regularization parameter under ideal cluster assumptions and analyzes the behavior of additive and multiplicative regularizers in k-means clustering.
Findings
Regularization bounds are derived for ideal clusters.
Additive and multiplicative regularizers can reduce solution ambiguity.
Experiments show effectiveness when clusters deviate from ideal assumptions.
Abstract
Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most common formulation of the regularization function is the additive linear term , where is the number of clusters and a positive coefficient. Currently, there are no principled guidelines for setting a value for the critical hyperparameter . In this paper, we derive rigorous bounds for assuming clusters are {\em ideal}. Ideal clusters (defined as -dimensional spheres with identical radii) are close proxies for k-means clusters (-dimensional spherically symmetric distributions with identical standard deviations). Experiments show that the k-means algorithm with additive regularizer often yields multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Stochastic Gradient Optimization Techniques · Bayesian Methods and Mixture Models
