Covariance-Insured Screening

Kevin He; Jian Kang; Hyokyoung Grace Hong; Ji Zhu; Yanming Li; Huazhen; Lin; Han Xu; Yi Li

arXiv:1805.06595·stat.ML·May 18, 2018

Covariance-Insured Screening

Kevin He, Jian Kang, Hyokyoung Grace Hong, Ji Zhu, Yanming Li, Huazhen, Lin, Han Xu, Yi Li

PDF

Open Access

TL;DR

This paper introduces a covariance-insured screening method that leverages inter-feature dependence to detect weak but jointly informative predictors in ultrahigh-dimensional data, improving biomarker discovery.

Contribution

The paper proposes a novel covariance-insured screening approach that incorporates correlation information to identify weak signals missed by existing methods.

Findings

01

Method effectively detects weak signals in simulations.

02

Application to cancer data identifies potential genetic factors.

03

Outperforms traditional screening methods in accuracy.

Abstract

Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation information, are likely to miss these weak signals. By incorporating the inter-feature dependence, we propose a covariance-insured screening methodology to identify predictors that are jointly informative but only marginally weakly associated with outcomes. The validity of the method is examined via extensive simulations and real data studies for selecting potential genetic factors related to the onset of cancer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gene expression and cancer classification · Genetic Associations and Epidemiology