Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score
Luca Coraggio, Pietro Coretto

TL;DR
This paper introduces a unifying approach for selecting the number of clusters, models, and algorithms based on quadratic discriminant scores, providing a robust criterion for comparing diverse clustering solutions.
Contribution
It develops quadratic score-based criteria for cluster validation, applicable to a broad class of distributions, and proposes a bootstrap-based selection rule for optimal clustering solutions.
Findings
The quadratic scores are consistent with elliptically-symmetric distributions.
The method can compare partitions from different clustering algorithms.
Numerical experiments show overall superior performance of the proposed approach.
Abstract
Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. Moreover, they are often restricted to operate on partitions obtained from a specific method. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant score function and parameters describing clusters' size, center and scatter. We develop two cluster-quality criteria called quadratic scores. We show that these criteria are consistent with groups generated from a general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Statistical Methods and Bayesian Inference
