Model-Free, Monotone Invariant and Computationally Efficient Feature Screening with Data-adaptive Threshold
Linsui Deng, Yilin Zhang

TL;DR
This paper introduces SIT-BY, a model-free, monotone-invariant feature screening method that is computationally efficient and adaptively controls false discovery rate, suitable for ultrahigh-dimensional data analysis.
Contribution
The paper proposes a novel screening procedure using sliced independence estimates that is model-free, invariant to monotone transformations, and computationally efficient, with FDR control.
Findings
Achieves almost linear computational complexity.
Asymptotically controls false discovery rate.
Demonstrates strong performance in simulations and genome data.
Abstract
Feature screening for ultrahigh-dimension, in general, proceeds with two essential steps. The first step is measuring and ranking the marginal dependence between response and covariates, and the second is determining the threshold. We develop a new screening procedure, called SIT-BY procedure, that possesses appealing statistical properties in both steps. By employing sliced independence estimates in the measuring and ranking stage, our proposed procedure requires no model assumptions, remains invariant to monotone transformation, and achieves almost linear computation complexity. Inspired by false discovery rate (FDR) control procedures, we offer a data-adaptive threshold benefit from the asymptotic normality of test statistics. Under moderate conditions, we demonstrate that our procedure can asymptotically control the FDR while maintaining the sure screening property. We investigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Biology Techniques and Applications · Genetic and phenotypic traits in livestock · Gene expression and cancer classification
