Density Ratio Estimation and Neyman Pearson Classification with Missing   Data

Josh Givens; Song Liu; Henry W J Reeve

arXiv:2302.10655·stat.ML·February 22, 2023

Density Ratio Estimation and Neyman Pearson Classification with Missing Data

Josh Givens, Song Liu, Henry W J Reeve

PDF

Open Access

TL;DR

This paper addresses density ratio estimation with missing not at random data, proposing a consistent method called M-KLIEP, providing theoretical guarantees, and adapting Neyman-Pearson classification to handle missing data effectively.

Contribution

The paper introduces M-KLIEP, a novel density ratio estimation method for MNAR data, with theoretical error bounds and an adaptation of Neyman-Pearson classification for this setting.

Findings

01

M-KLIEP restores consistency in MNAR data density ratio estimation.

02

Finite sample bounds demonstrate minimax optimality of M-KLIEP.

03

The adapted Neyman-Pearson classifier controls Type I error and achieves high power.

Abstract

Density Ratio Estimation (DRE) is an important machine learning technique with many downstream applications. We consider the challenge of DRE with missing not at random (MNAR) data. In this setting, we show that using standard DRE methods leads to biased results while our proposal (M-KLIEP), an adaptation of the popular DRE procedure KLIEP, restores consistency. Moreover, we provide finite sample estimation error bounds for M-KLIEP, which demonstrate minimax optimality with respect to both sample size and worst-case missingness. We then adapt an important downstream application of DRE, Neyman-Pearson (NP) classification, to this MNAR setting. Our procedure both controls Type I error and achieves high power, with high probability. Finally, we demonstrate promising empirical performance both synthetic data and real-world data with simulated missingness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Domain Adaptation and Few-Shot Learning