Robust distance correlation for variable screening
Tianzhou Ma, Hongjie Ke, Zhao Ren

TL;DR
This paper introduces a robust distance correlation based screening method tailored for ultrahigh-dimensional data with heavy tails, enhancing feature selection and prediction accuracy in complex datasets.
Contribution
The paper proposes a novel robust distance correlation screening method that effectively handles heavy-tailed data, improving feature selection in ultrahigh-dimensional regression.
Findings
Outperforms existing screening methods in simulations with heavy-tailed data.
Improves gene prioritization in TCGA pancreatic cancer RNA-seq data.
Demonstrates robustness and efficiency in high-dimensional, heavy-tailed scenarios.
Abstract
High-dimensional data are commonly seen in modern statistical applications, variable selection methods play indispensable roles in identifying the critical features for scientific discoveries. Traditional best subset selection methods are computationally intractable with a large number of features, while regularization methods such as Lasso, SCAD and their variants perform poorly in ultrahigh-dimensional data due to low computational efficiency and unstable algorithm. Sure screening methods have become popular alternatives by first rapidly reducing the dimension using simple measures such as marginal correlation then applying any regularization methods. A number of screening methods for different models or problems have been developed, however, none of the methods have targeted at data with heavy tailedness, which is another important characteristics of modern big data. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Bayesian Methods and Mixture Models
