Goodness-of-fit testing of the distribution of posterior classification probabilities for validating model-based clustering
Salima El Kolei (CREST), Matthieu Marbac (LMBA)

TL;DR
This paper introduces a novel internal goodness-of-fit test for model-based clustering that evaluates the validity of posterior classification probabilities without requiring external labels, suitable for high-dimensional data.
Contribution
It develops a new empirical likelihood-based testing procedure for assessing the distribution of posterior probabilities in clustering, avoiding the curse of dimensionality and external validation.
Findings
The method effectively tests the validity of posterior probabilities in high-dimensional clustering.
It circumvents the curse of dimensionality by focusing on the fixed dimension of the classification simplex.
The test can detect any alternative hypothesis asymptotically using a vector of chi-square-like statistics.
Abstract
We present the first method for assessing the relevance of a model-based clustering result in a general framework. Standard validation criteria, like the adjusted Rand index, rely on external labels to assess partition accuracy; consequently, they are inapplicable to real-world clustering problems where labels are missing. In contrast, our method offers an internal goodness-of-fit diagnostic, since it evaluates the validity of the clustering mechanism by testing the specification of the posterior probabilities of classification defined on the unit simplex. Because this simplex dimension is fixed by the number of clusters, the procedure naturally circumvents the curse of dimensionality, making it applicable to high-dimensional data where traditional density-based tests fail. The testing procedure requires only a consistent estimator of the parameters and the associated posterior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
