Predictive Value Generalization Bounds
Keshav Vemuri, Nathan Srebro

TL;DR
This paper develops new distribution-free bounds for the positive and negative predictive values in binary classification, addressing the limitations of traditional error rate bounds and introducing a measure called the order coefficient.
Contribution
It introduces novel large deviation and uniform convergence bounds for predictive values, linking them to a new complexity measure called the order coefficient.
Findings
Derived distribution-free bounds for predictive values
Established relation between order coefficient and VC dimension
Provided theoretical guarantees for scoring function generalization
Abstract
In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification. The positive and negative predictive values (ppv and npv, respectively) are conditional probabilities of the true label matching a classifier's predicted label. The usual classification error rate is a linear combination of these probabilities, and therefore, concentration inequalities for the error rate do not yield confidence intervals for the two separate predictive values. We study generalization properties of scoring functions with respect to predictive values by deriving new distribution-free large deviation and uniform convergence bounds. The latter bound is stated in terms of a measure of function class complexity that we call the order coefficient; we relate this combinatorial quantity to the VC-subgraph dimension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
