Verifying the Selected Completely at Random Assumption in Positive-Unlabeled Learning
Pawe{\l} Teisseyre, Konrad Furma\'nczyk, Jan Mielniczuk

TL;DR
This paper introduces a fast statistical test to verify if the positive-unlabeled learning data meets the SCAR assumption, aiding in selecting appropriate algorithms based on the labeling mechanism.
Contribution
The authors propose a simple, computationally efficient test to determine if data satisfies the SCAR assumption in PU learning, with theoretical justification and empirical validation.
Findings
The test effectively detects deviations from SCAR.
It controls the type I error well in experiments.
It can be used as a pre-processing step for PU algorithms.
Abstract
The goal of positive-unlabeled (PU) learning is to train a binary classifier on the basis of training data containing positive and unlabeled instances, where unlabeled observations can belong either to the positive class or to the negative class. Modeling PU data requires certain assumptions on the labeling mechanism that describes which positive observations are assigned a label. The simplest assumption, considered in early works, is SCAR (Selected Completely at Random Assumption), according to which the propensity score function, defined as the probability of assigning a label to a positive observation, is constant. On the other hand, a much more realistic assumption is SAR (Selected at Random), which states that the propensity function solely depends on the observed feature vector. SCAR-based algorithms are much simpler and computationally much faster compared to SAR-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Assessment and Pedagogy · Educational Technology and Assessment · Intelligent Tutoring Systems and Adaptive Learning
