The Effect of Class Imbalance on Precision-Recall Curves
Christopher K I Williams

TL;DR
This paper investigates how class imbalance affects the precision-recall curve and related metrics, providing a way to predict their changes with varying positive-negative ratios in test data.
Contribution
It introduces a method to predict the impact of class imbalance on precision-recall curves and associated metrics, which was not well understood before.
Findings
Precision depends on class ratio and classifier rates.
The relationship enables prediction of PR curve changes with class imbalance.
Implications for evaluating classifiers on imbalanced data.
Abstract
In this note I study how the precision of a classifier depends on the ratio of positive to negative cases in the test set, as well as the classifier's true and false positive rates. This relationship allows prediction of how the precision-recall curve will change with , which seems not to be well known. It also allows prediction of how and the Precision Gain and Recall Gain measures of Flach and Kull (2015) vary with .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
