Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled Data
Farid Bagirov, Dmitry Ivanov, Aleksei Shpilman

TL;DR
This paper compares one-class and positive unlabeled learning methods for binary classification with limited labeled data, highlighting when to use each approach based on unlabeled data reliability, and proposes robust modifications for unreliable data scenarios.
Contribution
It provides an extensive experimental comparison of OC and PU algorithms and introduces modifications to OC algorithms for robustness against unreliable unlabeled data.
Findings
PU algorithms perform better with reliable unlabeled data.
Modified OC algorithms are robust when unlabeled data is unreliable.
Guidelines and statistical tests help determine data reliability.
Abstract
When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that when unlabeled data is available, the go-to algorithms should always come from the PU group. However, we find that this is not always the case if unlabeled data is unreliable, i.e. contains limited or biased latent negative data. We perform an extensive experimental study of a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned. Furthermore, we propose PU modifications of state-of-the-art OC algorithms that are robust to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications
