Positive and Unlabeled Learning through Negative Selection and   Imbalance-aware Classification

Marco Frasca; Nicol\`o Cesa-Bianchi

arXiv:1805.07331·cs.LG·January 28, 2019

Positive and Unlabeled Learning through Negative Selection and Imbalance-aware Classification

Marco Frasca, Nicol\`o Cesa-Bianchi

PDF

Open Access

TL;DR

This paper introduces a novel learning algorithm for positive and unlabeled data that combines active negative example selection with imbalance-aware classification, improving performance in protein function prediction tasks.

Contribution

It presents a new method integrating active learning and imbalance-aware techniques specifically for PU learning, addressing label scarcity and class imbalance.

Findings

01

Outperforms state-of-the-art methods on protein function prediction benchmarks

02

Active negative selection and imbalance-aware learning work synergistically

03

Effective in scenarios with scarce positive labels and no explicit negatives

Abstract

Motivated by applications in protein function prediction, we consider a challenging supervised classification setting in which positive labels are scarce and there are no explicit negative labels. The learning algorithm must thus select which unlabeled examples to use as negative training points, possibly ending up with an unbalanced learning problem. We address these issues by proposing an algorithm that combines active learning (for selecting negative examples) with imbalance-aware learning (for mitigating the label imbalance). In our experiments we observe that these two techniques operate synergistically, outperforming state-of-the-art methods on standard protein function prediction benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms