The Effect of Class Imbalance on Precision-Recall Curves

Christopher K I Williams

arXiv:2007.01905·cs.LG·April 28, 2021

The Effect of Class Imbalance on Precision-Recall Curves

Christopher K I Williams

PDF

TL;DR

This paper investigates how class imbalance affects the precision-recall curve and related metrics, providing a way to predict their changes with varying positive-negative ratios in test data.

Contribution

It introduces a method to predict the impact of class imbalance on precision-recall curves and associated metrics, which was not well understood before.

Findings

01

Precision depends on class ratio and classifier rates.

02

The relationship enables prediction of PR curve changes with class imbalance.

03

Implications for evaluating classifiers on imbalanced data.

Abstract

In this note I study how the precision of a classifier depends on the ratio $r$ of positive to negative cases in the test set, as well as the classifier's true and false positive rates. This relationship allows prediction of how the precision-recall curve will change with $r$ , which seems not to be well known. It also allows prediction of how $F_{β}$ and the Precision Gain and Recall Gain measures of Flach and Kull (2015) vary with $r$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.