Large Scale Correlation Screening
Alfred O. Hero, Bala Rajaratnam

TL;DR
This paper develops asymptotic theory for correlation screening methods in high-dimensional data, enabling scalable identification of highly correlated variables across different applications, with validation on gene-expression data.
Contribution
It introduces a theoretical framework for threshold-based correlation screening, addressing phase transition phenomena and false positive rates in high-dimensional settings.
Findings
Asymptotic expressions for mean number of discoveries
Identification of phase transition thresholds
Validation on large gene-expression dataset
Abstract
This paper treats the problem of screening for variables with high correlations in high dimensional data in which there can be many fewer samples than variables. We focus on threshold-based correlation screening methods for three related applications: screening for variables with large correlations within a single treatment (autocorrelation screening); screening for variables with large cross-correlations over two treatments (cross-correlation screening); screening for variables that have persistently large auto-correlations over two treatments (persistent-correlation screening). The novelty of correlation screening is that it identifies a smaller number of variables which are highly correlated with others, as compared to identifying a number of correlation parameters. Correlation screening suffers from a phase transition phenomenon: as the correlation threshold decreases the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Bioinformatics and Genomic Networks
