Regularization and Optimization in Model-Based Clustering
Raphael Araujo Sampaio, Joaquim Dias Garcia, Marcus Poggi, Thibaut, Vidal

TL;DR
This paper introduces improved optimization and regularization techniques for Gaussian Mixture Models, significantly enhancing cluster recovery and revealing complex data structures beyond traditional k-means methods.
Contribution
It develops new algorithms and regularization strategies for GMMs, enabling better clustering performance and overfitting prevention, with open-source Julia packages provided.
Findings
Combining optimization and regularization yields superior clustering results.
Enhanced algorithms outperform existing methods in recovering complex clusters.
Open-source tools facilitate broader application of advanced GMM techniques.
Abstract
Due to their conceptual simplicity, k-means algorithm variants have been extensively used for unsupervised cluster analysis. However, one main shortcoming of these algorithms is that they essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estimating a quadratic number of parameters per cluster to represent the covariance matrices. This poses two main issues: (i) the underlying optimization problems are challenging due to their larger number of local minima, and (ii) their solutions can overfit the data. In this work, we design search strategies that circumvent both issues. We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Advanced Clustering Algorithms Research
