Risk bounds for PU learning under Selected At Random assumption

Olivier Coudray (CELESTE); Christine Keribin (CELESTE); Pascal Massart; (CELESTE); Patrick Pamphile (CELESTE)

arXiv:2201.06277·math.ST·January 19, 2022·1 cites

Risk bounds for PU learning under Selected At Random assumption

Olivier Coudray (CELESTE), Christine Keribin (CELESTE), Pascal Massart, (CELESTE), Patrick Pamphile (CELESTE)

PDF

Open Access

TL;DR

This paper establishes risk bounds for positive-unlabeled learning under the Selected At Random assumption, analyzing the impact of label noise and providing near-optimal theoretical guarantees.

Contribution

It introduces risk bounds for PU learning with covariate-dependent labeling probabilities and quantifies label noise effects, along with a minimax risk lower bound.

Findings

01

Risk bounds are derived under the Selected At Random assumption.

02

The impact of label noise on PU learning is quantified.

03

A lower bound on minimax risk shows near-optimality of the bounds.

Abstract

Positive-unlabeled learning (PU learning) is known as a special case of semi-supervised binary classification where only a fraction of positive examples are labeled. The challenge is then to find the correct classifier despite this lack of information. Recently, new methodologies have been introduced to address the case where the probability of being labeled may depend on the covariates. In this paper, we are interested in establishing risk bounds for PU learning under this general assumption. In addition, we quantify the impact of label noise on PU learning compared to standard classification setting. Finally, we provide a lower bound on minimax risk proving that the upper bound is almost optimal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms