High-dimensional estimation with missing data: Statistical and computational limits
Kabir Aladin Verchand, Ankit Pensia, Saminul Haque, Rohith Kuditipudi

TL;DR
This paper investigates the limits of statistically optimal and computationally feasible methods for high-dimensional parameter estimation with missing data, revealing gaps in certain problems and proposing algorithms that nearly attain theoretical bounds.
Contribution
It demonstrates statistical-computational gaps in high-dimensional mean and covariance estimation under missing data, and introduces algorithms approaching these limits, except in linear regression where no gap exists.
Findings
Statistical-computational gap in mean estimation with missing data.
Sum-of-squares algorithms nearly achieve optimal sample complexity.
Linear regression with missing data does not exhibit a computational gap.
Abstract
We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an fraction of the observations are subject to an arbitrary (and unknown) missing not at random (MNAR) mechanism. When the true data is Gaussian, we provide evidence towards statistical-computational gaps in several problems. For mean estimation in norm, we show that in order to obtain error at most , for any constant contamination , (roughly) samples are necessary and that there is a computationally-inefficient algorithm which achieves this error. On the other hand, we show that any computationally-efficient method within certain popular families of algorithms requires a much larger sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Privacy-Preserving Technologies in Data · Machine Learning and Algorithms
