Benchmarking of Clustering Validity Measures Revisited
Connor Simpson, Ricardo J. G. B. Campello, Elizabeth Stojanovski

TL;DR
This paper provides a comprehensive benchmark of 26 internal clustering validity indexes using a new methodology and a large dataset collection, offering insights into their performance across diverse clustering scenarios.
Contribution
It introduces an improved evaluation framework for clustering validity indexes and applies it to a large, diverse dataset collection, enhancing the understanding of index behaviors.
Findings
Enhanced evaluation methodology for validity indexes.
Benchmark results across diverse datasets and algorithms.
Insights into index performance and behavior.
Abstract
Validation plays a crucial role in the clustering process. Many different internal validity indexes exist for the purpose of determining the best clustering solution(s) from a given collection of candidates, e.g., as produced by different algorithms or different algorithm hyper-parameters. In this study, we present a comprehensive benchmark study of 26 internal validity indexes, which includes highly popular classic indexes as well as more recently developed ones. We adopted an enhanced revision of the methodology presented in Vendramin et al. (2010), developed here to address several shortcomings of this previous work. This overall new approach consists of three complementary custom-tailored evaluation sub-methodologies, each of which has been designed to assess specific aspects of an index's behaviour while preventing potential biases of the other sub-methodologies. Each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
