The Art of Misclassification: Too Many Classes, Not Enough Points
Mario Franco, Gerardo Febres, Nelson Fern\'andez, Carlos Gershenson

TL;DR
This paper introduces an entropy-based measure of the inherent difficulty of classification problems, revealing fundamental limits on achievable accuracy regardless of models or data size.
Contribution
It proposes a formal, entropy-based metric for classificability that captures dataset intrinsic difficulty and establishes theoretical performance bounds.
Findings
The measure quantifies class overlap and uncertainty.
It provides an upper bound on classification accuracy.
The framework explains when classification is fundamentally ambiguous.
Abstract
Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Policy and Reform Studies · Economic Theory and Policy
