Recovering True Classifier Performance in Positive-Unlabeled Learning

Shantanu Jain; Martha White; Predrag Radivojac

arXiv:1702.00518·stat.ML·February 3, 2017·5 cites

Recovering True Classifier Performance in Positive-Unlabeled Learning

Shantanu Jain, Martha White, Predrag Radivojac

PDF

Open Access

TL;DR

This paper presents methods to correct biased performance estimates in positive-unlabeled learning by leveraging class prior knowledge, including noisy label scenarios, and demonstrates their effectiveness on real data.

Contribution

It introduces correction techniques for performance measures in positive-unlabeled learning that account for class priors and label noise, improving evaluation accuracy.

Findings

01

Performance measures can be corrected using class priors.

02

Correction methods are effective even with label noise.

03

Experimental results validate the correction approaches.

Abstract

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic curve, or the precision-recall curve obtained on such data can be corrected with the knowledge of class priors; i.e., the proportions of the positive and negative examples in the unlabeled data. We extend the results to a noisy setting where some of the examples labeled positive are in fact negative and show that the correction also requires the knowledge of the proportion of noisy examples in the labeled positives. Using state-of-the-art algorithms to estimate the positive class prior and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms