Risk bounds for PU learning under Selected At Random assumption
Olivier Coudray (CELESTE), Christine Keribin (CELESTE), Pascal Massart, (CELESTE), Patrick Pamphile (CELESTE)

TL;DR
This paper establishes risk bounds for positive-unlabeled learning under the Selected At Random assumption, analyzing the impact of label noise and providing near-optimal theoretical guarantees.
Contribution
It introduces risk bounds for PU learning with covariate-dependent labeling probabilities and quantifies label noise effects, along with a minimax risk lower bound.
Findings
Risk bounds are derived under the Selected At Random assumption.
The impact of label noise on PU learning is quantified.
A lower bound on minimax risk shows near-optimality of the bounds.
Abstract
Positive-unlabeled learning (PU learning) is known as a special case of semi-supervised binary classification where only a fraction of positive examples are labeled. The challenge is then to find the correct classifier despite this lack of information. Recently, new methodologies have been introduced to address the case where the probability of being labeled may depend on the covariates. In this paper, we are interested in establishing risk bounds for PU learning under this general assumption. In addition, we quantify the impact of label noise on PU learning compared to standard classification setting. Finally, we provide a lower bound on minimax risk proving that the upper bound is almost optimal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms
