A Novel Intrinsic Measure of Data Separability
Shuyue Guan, Murray Loew

TL;DR
This paper introduces the Distance-based Separability Index (DSI), an intrinsic, classifier-independent measure for quantifying dataset separability, with applications in GAN evaluation and clustering assessment.
Contribution
The paper proposes the DSI as a novel intrinsic measure of data separability that is independent of classifiers and can compare distributions effectively.
Findings
DSI accurately indicates dataset distribution similarity.
DSI outperforms existing separability measures on synthetic and real data.
DSI has potential applications in GAN performance evaluation and clustering analysis.
Abstract
In machine learning, the performance of a classifier depends on both the classifier model and the separability/complexity of datasets. To quantitatively measure the separability of datasets, we create an intrinsic measure -- the Distance-based Separability Index (DSI), which is independent of the classifier model. We consider the situation in which different classes of data are mixed in the same distribution to be the most difficult for classifiers to separate. We then formally show that the DSI can indicate whether the distributions of datasets are identical for any dimensionality. And we verify the DSI to be an effective separability measure by comparing to several state-of-the-art separability/complexity measures using synthetic and real datasets. Having demonstrated the DSI's ability to compare distributions of samples, we also discuss some of its other promising applications, such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
