Foundational principles for large scale inference: Illustrations through correlation mining
Alfred O. Hero, Bala Rajaratnam

TL;DR
This paper develops a unified framework to understand the sample complexity of correlation mining in large-scale, high-dimensional data, addressing fundamental limits in inference when sample sizes are limited.
Contribution
It introduces a comprehensive statistical framework for analyzing sample complexity across different asymptotic regimes in large-scale inference, especially for correlation mining.
Findings
Identifies distinct asymptotic regimes relevant to big data inference.
Quantifies sample complexity for correlation mining under various models.
Highlights the importance of high-dimensional regimes with fixed sample size.
Abstract
When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number of acquired samples (statistical replicates) is far fewer than the number of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size is fixed, and the dimension grows without bound. To address this gap, we develop a unified statistical framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
