The Sample Complexity of Robust Covariance Testing

Ilias Diakonikolas; Daniel M. Kane

arXiv:2012.15802·cs.LG·January 1, 2021·1 cites

The Sample Complexity of Robust Covariance Testing

Ilias Diakonikolas, Daniel M. Kane

PDF

Open Access

TL;DR

This paper investigates the increased sample complexity required for robustly testing covariance matrices in high-dimensional Gaussian models under contamination, showing it is fundamentally harder than in the clean case.

Contribution

It establishes a tight lower bound of rac{d^2}{2} for sample complexity in robust covariance testing, demonstrating the problem's inherent difficulty compared to non-robust settings.

Findings

01

Sample complexity for robust testing is rac{d^2}{2} samples.

02

Robust covariance testing is as hard as robust covariance estimation.

03

In the absence of contamination, testing requires only O(d) samples.

Abstract

We study the problem of testing the covariance matrix of a high-dimensional Gaussian in a robust setting, where the input distribution has been corrupted in Huber's contamination model. Specifically, we are given i.i.d. samples from a distribution of the form $Z = (1 - ϵ) X + ϵ B$ , where $X$ is a zero-mean and unknown covariance Gaussian $N (0, Σ)$ , $B$ is a fixed but unknown noise distribution, and $ϵ > 0$ is an arbitrarily small constant representing the proportion of contamination. We want to distinguish between the cases that $Σ$ is the identity matrix versus $γ$ -far from the identity in Frobenius norm. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O (d)$ samples. Moreover, this sample upper bound was shown to be best possible, within constant factors. Our main result is that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference