Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa
L. Crow, S. J. Watts

TL;DR
This paper establishes fundamental theoretical limits on classification performance by relating Cohen's Kappa to Kullback-Leibler divergences, providing a way to assess how close algorithms are to optimal error rates based on data distributions.
Contribution
It introduces a novel relationship between Cohen's Kappa and the Resistor Average Distance of Kullback-Leibler divergences, linking theoretical limits to practical classification metrics.
Findings
Algorithms often reach the theoretical performance limit dictated by data distributions.
The method accurately predicts the maximum achievable performance on diverse datasets.
Performance is constrained by data quality and variable relevance, not just algorithm choice.
Abstract
The performance of machine learning classification algorithms are evaluated by estimating metrics, often from the confusion matrix, using training data and cross-validation. However, these do not prove that the best possible performance has been achieved. Fundamental limits to error rates can be estimated using information distance measures. To this end, the confusion matrix has been formulated to comply with the Chernoff-Stein Lemma. This links the error rates to the Kullback-Leibler divergences between the probability density functions describing the two classes. This leads to a key result that relates Cohen's Kappa to the Resistor Average Distance which is the parallel resistor combination of the two Kullback-Leibler divergences. The Resistor Average Distance has units of bits and is estimated from the same training data used by the classification algorithm, using kNN estimates of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
