Predictive Value Generalization Bounds

Keshav Vemuri; Nathan Srebro

arXiv:2007.05073·stat.ML·July 13, 2020

Predictive Value Generalization Bounds

Keshav Vemuri, Nathan Srebro

PDF

Open Access

TL;DR

This paper develops new distribution-free bounds for the positive and negative predictive values in binary classification, addressing the limitations of traditional error rate bounds and introducing a measure called the order coefficient.

Contribution

It introduces novel large deviation and uniform convergence bounds for predictive values, linking them to a new complexity measure called the order coefficient.

Findings

01

Derived distribution-free bounds for predictive values

02

Established relation between order coefficient and VC dimension

03

Provided theoretical guarantees for scoring function generalization

Abstract

In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification. The positive and negative predictive values (ppv and npv, respectively) are conditional probabilities of the true label matching a classifier's predicted label. The usual classification error rate is a linear combination of these probabilities, and therefore, concentration inequalities for the error rate do not yield confidence intervals for the two separate predictive values. We study generalization properties of scoring functions with respect to predictive values by deriving new distribution-free large deviation and uniform convergence bounds. The latter bound is stated in terms of a measure of function class complexity that we call the order coefficient; we relate this combinatorial quantity to the VC-subgraph dimension.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification