Prediction in the presence of response-dependent missing labels

Hyebin Song; Garvesh Raskutti; Rebecca Willett

arXiv:2103.13555·stat.ML·March 26, 2021·1 cites

Prediction in the presence of response-dependent missing labels

Hyebin Song, Garvesh Raskutti, Rebecca Willett

PDF

Open Access

TL;DR

This paper introduces a novel method for handling response-dependent missing labels by jointly estimating event occurrence and detection likelihood, improving accuracy in datasets with biased missingness.

Contribution

The paper proposes a new non-convex algorithm that leverages prior knowledge to jointly estimate event occurrence and detection probability, addressing response-dependent missing labels.

Findings

01

Method outperforms existing approaches on synthetic data.

02

Algorithm achieves geometric convergence rates.

03

Effective in real wildfire detection dataset.

Abstract

In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring