Revisiting Data Complexity Metrics Based on Morphology for Overlap and Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular Problems Prospect
Jos\'e Daniel Pascual-Triana, David Charte, Marta Andr\'es Arroyo,, Alberto Fern\'andez, Francisco Herrera

TL;DR
This paper revisits data complexity metrics focusing on class overlap and imbalance, introducing new 'Overlap Number of Balls' metrics based on data morphology to better evaluate dataset difficulty and predict classifier performance.
Contribution
It proposes a novel family of complexity metrics based on ball coverage, improving assessment of class overlap and dataset complexity, especially in imbalanced scenarios.
Findings
New 'Overlap Number of Balls' metrics effectively estimate class overlap.
Metrics show strong correlation with classification performance.
Prospects for adapting metrics to complex singular problems are discussed.
Abstract
Data Science and Machine Learning have become fundamental assets for companies and research institutions alike. As one of its fields, supervised classification allows for class prediction of new samples, learning from given training data. However, some properties can cause datasets to be problematic to classify. In order to evaluate a dataset a priori, data complexity metrics have been used extensively. They provide information regarding different intrinsic characteristics of the data, which serve to evaluate classifier compatibility and a course of action that improves performance. However, most complexity metrics focus on just one characteristic of the data, which can be insufficient to properly evaluate the dataset towards the classifiers' performance. In fact, class overlap, a very detrimental feature for the classification process (especially when imbalance among class labels is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
