Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC)
Xin Tong, Yang Feng, Jingyi Jessica Li

TL;DR
This paper introduces a new umbrella algorithm for implementing Neyman-Pearson classification across various models, along with NP-ROC bands for better classifier evaluation and selection, addressing a gap in controlling type I errors.
Contribution
It develops the first universal umbrella algorithm for Neyman-Pearson classification applicable to all scoring classifiers and introduces NP-ROC bands for data-driven classifier comparison.
Findings
The umbrella algorithm effectively controls type I error below the threshold.
NP-ROC bands assist in selecting optimal significance levels.
The methods are validated through simulations and real data analysis.
Abstract
In many binary classification applications such as disease diagnosis and spam detection, practitioners often face great needs to control type I errors (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, , on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than do not satisfy the type I error control objective because the resulting classifiers are still likely to have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
