Prediction in the presence of response-dependent missing labels
Hyebin Song, Garvesh Raskutti, Rebecca Willett

TL;DR
This paper introduces a novel method for handling response-dependent missing labels by jointly estimating event occurrence and detection likelihood, improving accuracy in datasets with biased missingness.
Contribution
The paper proposes a new non-convex algorithm that leverages prior knowledge to jointly estimate event occurrence and detection probability, addressing response-dependent missing labels.
Findings
Method outperforms existing approaches on synthetic data.
Algorithm achieves geometric convergence rates.
Effective in real wildfire detection dataset.
Abstract
In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring
