Precision and Recall Reject Curves for Classification

Lydia Fischer; Patricia Wollstadt

arXiv:2308.08381·cs.LG·March 15, 2024·1 cites

Precision and Recall Reject Curves for Classification

Lydia Fischer, Patricia Wollstadt

PDF

Open Access 3 Reviews

TL;DR

This paper introduces reject curves based on precision and recall to evaluate classifier performance in high-certainty scenarios, especially for imbalanced data, providing more relevant insights than traditional accuracy-based reject curves.

Contribution

The paper proposes new reject curves for precision and recall, extending existing accuracy reject curves to better suit applications with imbalanced data and different performance metrics.

Findings

01

Precision- and recall-reject curves outperform accuracy reject curves on imbalanced and real-world data.

02

Prototype-based classifiers validate the effectiveness of the proposed curves.

03

The new curves offer more relevant insights into classifier performance in specific scenarios.

Abstract

For some classification scenarios, it is desirable to use only those classification instances that a trained model associates with a high certainty. To obtain such high-certainty instances, previous work has proposed accuracy-reject curves. Reject curves allow to evaluate and compare the performance of different certainty measures over a range of thresholds for accepting or rejecting classifications. However, the accuracy may not be the most suited evaluation metric for all applications, and instead precision or recall may be preferable. This is the case, for example, for data with imbalanced class distributions. We therefore propose reject curves that evaluate precision and recall, the recall-reject curve and the precision-reject curve. Using prototype-based classifiers from learning vector quantization, we first validate the proposed curves on artificial benchmark data against the…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 1· strong rejectConfidence 4

Strengths

- The paper is well-written and does a good job of covering prior work on classification with rejection. - Exploring alternate ways for evaluating a classifier with a reject option is an important problem to tackle.

Weaknesses

- The paper is limited in novelty. The fact that metrics based on precision and recall are better alternatives to accuracy under class imbalance is well known in the literature. Merely proposing their use in evaluating rejection-based classifiers does not make for a significant or novel contribution. - The experimental conclusions aren't very strong either. It is interesting that precision and recall curves show different trends compared to accuracy curves, but this observation alone does not ma

Reviewer 02Rating 1· strong rejectConfidence 4

Strengths

The assumptions upon which the work rests are valid, and it is quite likely that precision and/or recall reject curves are a good alternative to ARCs on imbalanced datasets.

Weaknesses

Having said this, this extensions is a very trivial extension, as it is well-known (as the authors also observe) that precision and recall (and F1) are better evaluation measures for imbalanced data. So this paper could, in my opinion, only be of interest if the performed evaluation is exceptionally thorough and valid, so that this is a paper that clearly establishes the value of this proposal. Unfortunately, I think the evaluation is lacking for several reasons: - the comparison is only on 4 sm

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

Rejection options is an important area of research, particularly impactful in some domains like medicine.

Weaknesses

It is unclear how this is different from standard precision recall curves. In the binary classification case, given that precision and recall focus on the positive class only this work appears to simplify down to precision-recall curves. It therefore does not appear to propose anything novel that would help with choosing particularly thresholds for a task. It may be different in the case of multiple classes but none of the experimental datasets have more than two classes.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning in Healthcare · Machine Learning and Data Classification