TL;DR
This paper addresses learning from positive and unlabeled data with selection biases, proposing methods that incorporate the labeling mechanism and handle unknown biases, improving classifier performance.
Contribution
It introduces a theoretically grounded empirical risk method for biased PU learning and explores learning under unknown labeling mechanisms with practical solutions.
Findings
Incorporating the labeling mechanism improves classifier accuracy.
The proposed methods are effective even when the bias is unknown.
Theoretical analysis confirms the validity of the approach.
Abstract
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be ena BHbled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
