Entropy regularization in probabilistic clustering
Beatrice Franzolini, Giovanni Rebaudo

TL;DR
This paper introduces an entropy regularization method for Bayesian clustering that improves interpretability by balancing cluster sizes, addressing the common issue of unbalanced partitions in nonparametric mixture models.
Contribution
It proposes a novel entropy-regularized Bayesian estimator that reduces sparsely-populated clusters and enhances interpretability, applicable to any posterior distribution.
Findings
Reduces unbalanced cluster sizes in Bayesian clustering
Enhances interpretability of clustering results
Provides a computationally convenient correction method
Abstract
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalized likelihood, we show how the unbalance can be explained as a direct consequence of the cost functions involved in estimating the partition. In light of our findings, we propose a novel Bayesian estimator of the clustering configuration. The proposed estimator is equivalent to a post-processing procedure that reduces the number of sparsely-populated clusters and enhances interpretability. The procedure takes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Advanced Clustering Algorithms Research
