A note on marginal correlation based screening
Run Wang, Somak Dutta, Vivekananda Roy

TL;DR
This paper highlights potential issues with marginal correlation based screening methods in high-dimensional data, especially when predictors are correlated, and demonstrates their limitations through examples and a genome-wide association study.
Contribution
It provides simple examples to illustrate the problems of marginal correlation screening in correlated predictor settings and compares its performance with alternative methods.
Findings
Marginal correlation screening can fail with correlated predictors.
Performance issues are demonstrated through examples and real data.
Alternative screening methods may outperform marginal correlation in certain scenarios.
Abstract
Independence screening methods such as the two sample -test and the marginal correlation based ranking are among the most widely used techniques for variable selection in ultrahigh dimensional data sets. In this short note, simple examples are used to demonstrate potential problems with the independence screening methods in the presence of correlated predictors. Also, an example is considered where all important variables are independent among themselves and all but one important variables are independent with the unimportant variables. Furthermore, a real data example from a genome wide association study is used to illustrate inferior performance of marginal correlation screening compared to another screening method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
