A Distance-based Separability Measure for Internal Cluster Validation
Shuyue Guan, Murray Loew

TL;DR
This paper introduces a new internal cluster validation index called the Distance-based Separability Index (DSI), which effectively evaluates clustering quality without true labels, and compares it with existing CVIs across multiple datasets.
Contribution
The paper proposes the DSI, a novel data separability-based CVI, and provides a comprehensive comparison with existing indices using diverse datasets and clustering algorithms.
Findings
DSI is effective and competitive with existing CVIs.
DSI outperforms some traditional indices in various datasets.
The paper offers a process for evaluating and comparing CVIs.
Abstract
To evaluate clustering results is a significant part of cluster analysis. Since there are no true class labels for clustering in typical unsupervised learning, many internal cluster validity indices (CVIs), which use predicted labels and data, have been created. Without true labels, to design an effective CVI is as difficult as to create a clustering method. And it is crucial to have more CVIs because there are no universal CVIs that can be used to measure all datasets and no specific methods of selecting a proper CVI for clusters without true labels. Therefore, to apply a variety of CVIs to evaluate clustering results is necessary. In this paper, we propose a novel internal CVI -- the Distance-based Separability Index (DSI), based on a data separability measure. We compared the DSI with eight internal CVIs including studies from early Dunn (1974) to most recent CVDD (2019) and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data-Driven Disease Surveillance · Data Mining Algorithms and Applications
