On the Use of Relative Validity Indices for Comparing Clustering Approaches
Luke W. Yerbury, Ricardo J. G. B. Campello, G. C. Livingston Jr, Mark Goldsworthy, Lachlan O'Neil

TL;DR
This paper critically examines the use of Relative Validity Indices for selecting similarity paradigms in clustering, revealing fundamental limitations and recommending alternative validation methods based on external criteria and domain knowledge.
Contribution
It provides the first comprehensive empirical and theoretical analysis of RVIs for similarity paradigm selection, highlighting their unsuitability and proposing more rigorous validation approaches.
Findings
RVIs are unreliable for similarity paradigm selection.
Fundamental conceptual limitations undermine RVI use.
External validation is recommended for better SP selection.
Abstract
Relative Validity Indices (RVIs) such as the Silhouette Width Criterion and Davies Bouldin indices are the most widely used tools for evaluating and optimising clustering outcomes. Traditionally, their ability to rank collections of candidate dataset partitions has been used to guide the selection of the number of clusters, and to compare partitions from different clustering algorithms. However, there is a growing trend in the literature to use RVIs when selecting a Similarity Paradigm (SP) for clustering - the combination of normalisation procedure, representation method, and distance measure which affects the computation of object dissimilarities used in clustering. Despite the growing prevalence of this practice, there has been no empirical or theoretical investigation into the suitability of RVIs for this purpose. Moreover, since RVIs are computed using object dissimilarities, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
MethodsSemi-Pseudo-Label · Attentive Walk-Aggregating Graph Neural Network
