High-Dimensional Gaussian Mean Estimation under Realizable Contamination
Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas

TL;DR
This paper investigates the challenge of estimating the mean of a high-dimensional Gaussian distribution when data is missing in a structured adversarial manner, revealing fundamental limits and efficient algorithms within this contamination model.
Contribution
It establishes an information-computation gap for Gaussian mean estimation under realizable contamination, providing both lower bounds in the Statistical Query model and a nearly matching efficient algorithm.
Findings
Proves an SQ lower bound indicating computational hardness.
Develops an algorithm with near-optimal sample and runtime tradeoff.
Characterizes the complexity of mean estimation under structured missing data.
Abstract
We study mean estimation for a Gaussian distribution with identity covariance in under a missing data scheme termed realizable -contamination model. In this model an adversary can choose a function between 0 and and each sample goes missing with probability . Recent work Ma et al., 2024 proposed this model as an intermediate-strength setting between Missing Completely At Random (MCAR) -- where missingness is independent of the data -- and Missing Not At Random (MNAR) -- where missingness may depend arbitrarily on the sample values and can lead to non-identifiability issues. That work established information-theoretic upper and lower bounds for mean estimation in the realizable contamination model. Their proposed estimators incur runtime exponential in the dimension, leaving open the possibility of computationally efficient algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Markov Chains and Monte Carlo Methods
