Detection of correlations
Ery Arias-Castro, S\'ebastien Bubeck, G\'abor Lugosi

TL;DR
This paper investigates the challenge of detecting whether high-dimensional data has independent components or contains a small correlated subset, providing bounds on detection risk and analyzing test optimality.
Contribution
It establishes minimax risk bounds for correlation detection and compares the performance of simple tests versus the generalized likelihood ratio test.
Findings
Simple tests often near-optimal in many scenarios
GLLR test can be suboptimal in key cases
Bounds depend on subset size, correlation level, and structure
Abstract
We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
