Evaluating the Predictive Performance of Positive-Unlabelled Classifiers: a brief critical review and practical recommendations for improvement
Jack D. Saunders, Alex, A. Freitas

TL;DR
This paper critically reviews how positive-unlabelled classifiers are evaluated, highlighting challenges with current metrics and offering practical recommendations to improve evaluation practices in PU learning.
Contribution
It provides a comprehensive critique of existing evaluation methods for PU classifiers and suggests practical improvements based on analysis of 51 articles.
Findings
Many evaluation metrics are unsuitable for PU learning due to incomplete labels.
Current practices often rely on approximations that can misrepresent classifier performance.
The paper offers guidelines to enhance the reliability of PU classifier evaluation.
Abstract
Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances. Whilst much work has been done proposing methods for PU learning, little has been written on the subject of evaluating these methods. Many popular standard classification metrics cannot be precisely calculated due to the absence of fully labelled data, so alternative approaches must be taken. This short commentary paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers and provides practical recommendations for improvements in this area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
