DCSI -- An improved measure of cluster separability based on separation and connectedness
Jana Gauss, Fabian Scheipl, Moritz Herrmann

TL;DR
The paper introduces DCSI, a new measure for evaluating cluster separability based on separation and connectedness, improving clustering assessment especially for density-based methods.
Contribution
It proposes DCSI, a novel separability measure that captures separation and connectedness, filling gaps in existing cluster validity indices for density-based clustering evaluation.
Findings
DCSI correlates strongly with clustering performance measured by ARI.
DCSI effectively identifies touching or overlapping classes in real-world data.
DCSI shows limitations with multi-class overlapping data in density-based clustering.
Abstract
Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Text and Document Classification Technologies
MethodsSparse Evolutionary Training
