Learning from Positive and Unlabeled Data with Augmented Classes

Zhongnian Li; Liutao Yang; Zhongchen Ma; Tongfeng Sun; Xinzheng Xu and; Daoqiang Zhang

arXiv:2207.13274·cs.LG·July 28, 2022

Learning from Positive and Unlabeled Data with Augmented Classes

Zhongnian Li, Liutao Yang, Zhongchen Ma, Tongfeng Sun, Xinzheng Xu and, Daoqiang Zhang

PDF

Open Access

TL;DR

This paper introduces a novel unbiased risk estimator for positive and unlabeled learning that accounts for augmented classes, improving adaptability in dynamic real-world scenarios.

Contribution

It proposes a new estimator for PU learning with augmented classes and provides theoretical guarantees for its convergence.

Findings

01

Effective on multiple realistic datasets

02

Outperforms existing PU learning methods

03

Theoretically guarantees convergence

Abstract

Positive Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled data, which is utilized in many real-world scenarios. However, existing PU learning algorithms cannot deal with the real-world challenge in an open and changing scenario, where examples from unobserved augmented classes may emerge in the testing phase. In this paper, we propose an unbiased risk estimator for PU learning with Augmented Classes (PUAC) by utilizing unlabeled data from the augmented classes distribution, which can be easily collected in many real-world scenarios. Besides, we derive the estimation error bound for the proposed estimator, which provides a theoretical guarantee for its convergence to the optimal solution. Experiments on multiple realistic datasets demonstrate the effectiveness of proposed approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques