Robust and Efficient Imbalanced Positive-Unlabeled Learning with   Self-supervision

Emilio Dorigatti; Jonas Schweisthal; Bernd Bischl; Mina Rezaei

arXiv:2209.02459·cs.LG·September 7, 2022·1 cites

Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision

Emilio Dorigatti, Jonas Schweisthal, Bernd Bischl, Mina Rezaei

PDF

Open Access 1 Repo

TL;DR

This paper introduces ImPULSeS, a self-supervised learning framework designed to improve positive-unlabeled (PU) learning in imbalanced datasets, achieving lower error rates and greater robustness than previous methods.

Contribution

It presents a novel self-supervised pretraining approach with debiased contrastive loss and reweighted PU loss for imbalanced PU learning, outperforming prior state-of-the-art methods.

Findings

01

Halves the error rate compared to previous methods.

02

Shows robustness to prior misspecification.

03

Performs well even with unrelated pretraining data.

Abstract

Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., where most examples are actually negatives. Most existing approaches for PU learning, however, only consider artificially balanced datasets and it is unclear how well they perform in the realistic scenario of imbalanced and long-tail data distribution. This paper proposes to tackle this challenge via robust and efficient self-supervised pretraining. However, training conventional self-supervised learning methods when applied with highly imbalanced PU distribution needs better reformulation. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jschweisthal/impulses
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification

MethodsBalanced Selection