Estimation beyond Missing (Completely) at Random
Tianyi Ma, Kabir A. Verchand, Thomas B. Berrett, Tengyao Wang, Richard J. Samworth

TL;DR
This paper develops a robust framework for estimating population parameters under various non-random missing data mechanisms, extending classical models to handle more realistic contamination scenarios.
Contribution
It introduces a new contamination model framework for missing data, providing minimax bounds and adaptive methods for mean estimation under complex missingness.
Findings
Minimax quantiles decompose into MCAR and robust error components.
Realisable contamination classes outperform earlier arbitrary classes in minimax risk.
Consistent mean estimation is achievable even with high levels of missingness in Gaussian models.
Abstract
We study the effects of missingness on the estimation of population parameters. Moving beyond restrictive missing completely at random (MCAR) assumptions, we first formulate a missing data analogue of Huber's arbitrary -contamination model. For mean estimation with respect to squared Euclidean error loss, we show that the minimax quantiles decompose as a sum of the corresponding minimax quantiles under a heterogeneous, MCAR assumption, and a robust error term, depending on , that reflects the additional error incurred by departure from MCAR. We next introduce natural classes of realisable -contamination models, where an MCAR version of a base distribution is contaminated by an arbitrary missing not at random (MNAR) version of . These classes are rich enough to capture various notions of biased sampling and sensitivity conditions, yet we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
