Efficient Signal Inclusion With Genomic Applications
X. Jessie Jeng, Teng Zhang, and Jung-Ying Tzeng

TL;DR
This paper introduces a new measure called signal missing rate for false negative control in high-dimensional data, proposing adaptive procedures that improve signal detection efficiency in genomic studies like GWAS.
Contribution
It develops novel data-adaptive methods to control false negatives effectively without increasing false positives, supported by theoretical and simulation validation.
Findings
Effective removal of irrelevant SNPs in GWAS
High retention of relevant SNPs for analysis
Validated efficiency and adaptivity through simulations
Abstract
This paper addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate as a new measure for false negative control to account for the variability of false negative proportion. Novel data-adaptive procedures are developed to control signal missing rate without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant SNPs while retaining a high proportion of relevant SNPs for subsequent polygenic analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
