Detection of correlations

Ery Arias-Castro; S\'ebastien Bubeck; G\'abor Lugosi

arXiv:1106.1193·math.ST·June 4, 2012

Detection of correlations

Ery Arias-Castro, S\'ebastien Bubeck, G\'abor Lugosi

PDF

TL;DR

This paper investigates the challenge of detecting whether high-dimensional data has independent components or contains a small correlated subset, providing bounds on detection risk and analyzing test optimality.

Contribution

It establishes minimax risk bounds for correlation detection and compares the performance of simple tests versus the generalized likelihood ratio test.

Findings

01

Simple tests often near-optimal in many scenarios

02

GLLR test can be suboptimal in key cases

03

Bounds depend on subset size, correlation level, and structure

Abstract

We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.