High-dimensional estimation with missing data: Statistical and computational limits

Kabir Aladin Verchand; Ankit Pensia; Saminul Haque; Rohith Kuditipudi

arXiv:2603.16712·math.ST·March 18, 2026

High-dimensional estimation with missing data: Statistical and computational limits

Kabir Aladin Verchand, Ankit Pensia, Saminul Haque, Rohith Kuditipudi

PDF

Open Access

TL;DR

This paper investigates the limits of statistically optimal and computationally feasible methods for high-dimensional parameter estimation with missing data, revealing gaps in certain problems and proposing algorithms that nearly attain theoretical bounds.

Contribution

It demonstrates statistical-computational gaps in high-dimensional mean and covariance estimation under missing data, and introduces algorithms approaching these limits, except in linear regression where no gap exists.

Findings

01

Statistical-computational gap in mean estimation with missing data.

02

Sum-of-squares algorithms nearly achieve optimal sample complexity.

03

Linear regression with missing data does not exhibit a computational gap.

Abstract

We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $ϵ$ fraction of the observations are subject to an arbitrary (and unknown) missing not at random (MNAR) mechanism. When the true data is Gaussian, we provide evidence towards statistical-computational gaps in several problems. For mean estimation in $ℓ_{2}$ norm, we show that in order to obtain error at most $ρ$ , for any constant contamination $ϵ \in (0, 1)$ , (roughly) $n ≳ d e^{1/ ρ^{2}}$ samples are necessary and that there is a computationally-inefficient algorithm which achieves this error. On the other hand, we show that any computationally-efficient method within certain popular families of algorithms requires a much larger sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Privacy-Preserving Technologies in Data · Machine Learning and Algorithms