Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

Lunjia Hu; Omer Reingold

arXiv:2008.08071·cs.DS·May 4, 2021·1 cites

Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

Lunjia Hu, Omer Reingold

PDF

Open Access

TL;DR

This paper presents algorithms for robustly estimating the mean of high-dimensional distributions with incomplete data and arbitrary outliers, achieving optimal error guarantees efficiently.

Contribution

It extends robust mean estimation methods to settings with highly incomplete data and outliers, providing nearly-linear time algorithms with optimal guarantees.

Findings

01

Achieves dimension-independent error bounds

02

Handles highly incomplete data with missing entries

03

Operates in nearly-linear time with respect to data size and dimension

Abstract

We study the problem of robustly estimating the mean of a $d$ -dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $εN$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $εN$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $O (N d)$ . Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring