TL;DR
This paper introduces a new internal cluster validity index called Distance-based Separability Index (DSI), which effectively evaluates clustering quality without true labels, and compares it with existing indices across multiple datasets.
Contribution
The paper proposes the DSI, a novel CVI based on data separability, and provides a comprehensive comparison with existing CVIs using real and synthetic datasets.
Findings
DSI is effective and competitive among other CVIs.
DSI outperforms some existing indices in clustering evaluation.
The paper introduces a new method, rank difference, for comparing CVI results.
Abstract
To evaluate clustering results is a significant part of cluster analysis. There are no true class labels for clustering in typical unsupervised learning. Thus, a number of internal evaluations, which use predicted labels and data, have been created. They are also named internal cluster validity indices (CVIs). Without true labels, to design an effective CVI is not simple because it is similar to create a clustering method. And, to have more CVIs is crucial because there is no universal CVI that can be used to measure all datasets, and no specific method for selecting a proper CVI for clusters without true labels. Therefore, to apply more CVIs to evaluate clustering results is necessary. In this paper, we propose a novel CVI - called Distance-based Separability Index (DSI), based on a data separability measure. We applied the DSI and eight other internal CVIs including early studies from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
