How Complex is your classification problem? A survey on measuring classification complexity
Ana C. Lorena, Lu\'is P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho

TL;DR
This survey reviews various measures of classification complexity derived from training data, discussing their use in estimating problem difficulty and supporting data-driven techniques, while also introducing an R package for these measures.
Contribution
It provides a comprehensive review of classification complexity measures, analyzes their application in recent research, and introduces a publicly available R package implementing these measures.
Findings
Complexity measures help estimate classification difficulty.
Recent literature demonstrates diverse applications of complexity measures.
The ECoL R package facilitates the use of these measures in practice.
Abstract
Characteristics extracted from the training datasets of classification problems have proven to be effective predictors in a number of meta-analyses. Among them, measures of classification complexity can be used to estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the known measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenges highlighted by such characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms
