Augmented prediction of a true class for Positive Unlabeled data under   selection bias

Jan Mielniczuk; Adam Wawrze\'nczyk

arXiv:2407.10309·stat.ML·July 16, 2024

Augmented prediction of a true class for Positive Unlabeled data under selection bias

Jan Mielniczuk, Adam Wawrze\'nczyk

PDF

Open Access 1 Repo

TL;DR

This paper introduces an augmented Positive Unlabeled (PU) prediction setting where observations are labeled at prediction time, allowing for feature-dependent labeling, and compares various Bayesian and autoencoder-based methods for improved accuracy.

Contribution

It formalizes the augmented PU prediction problem, establishes the Bayes classifier under this setting, and evaluates novel empirical Bayes variants including a variational autoencoder approach.

Findings

01

Autoencoder-based variant performs on par or better than others.

02

Augmented PU setting improves prediction accuracy over feature-only methods.

03

Classical classifiers risk bias when applied without considering augmentation.

Abstract

We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. This occurs commonly in practice -- we argue that the additional information is important for prediction, and call this task "augmented PU prediction". We allow for labeling to be feature dependent. In such scenario, Bayes classifier and its risk is established and compared with a risk of a classifier which for unlabeled data is based only on predictors. We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance. We emphasise dangers (and ease) of applying classical classification rule in the augmented PU scenario -- due to no preexisting studies, an unaware researcher is prone to skewing the obtained predictions. We conclude that the variant based on recently proposed variational autoencoder designed for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wawrzenczyka/VP-Bayes-S
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Imbalanced Data Classification Techniques