An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture based clustering
Christian Hennig, Pietro Coretto

TL;DR
This paper presents a new method for selecting the number of clusters in Gaussian mixture models that accounts for noise and assesses cluster quality using a nonparametric measure, improving clustering robustness.
Contribution
It introduces an adequacy-based approach combining a nonparametric quality measure with model simplicity criteria, applicable to OTRIMLE and other clustering methods.
Findings
The proposed method effectively determines the number of clusters in simulations.
It compares favorably with BIC and ICL criteria in real datasets.
The approach handles noise and non-Gaussian clusters well.
Abstract
We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto and Hennig 2016) of a Gaussian mixture model allowing for observations to be classified as "noise", but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic that measures how close the within-cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This nonparametric measure allows for non-Gaussian clusters as long as they have a good quality according to . The simplicity of a model is assessed by a measure that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
