A flexible model-free prediction-based framework for feature ranking
Jingyi Jessica Li, Yiling Chen, and Xin Tong

TL;DR
This paper introduces two model-free, prediction-objective-based criteria for marginal feature ranking, addressing limitations of traditional methods by considering feature distributions and prediction goals, with proven consistency and robustness.
Contribution
It proposes novel nonparametric ranking criteria aligned with prediction objectives, improving feature ranking accuracy and robustness, especially in biomedical research with sampling bias.
Findings
Both criteria achieve high-probability sample-level consistency.
NPC demonstrates robustness to sampling bias.
Simulation and real data validate the advantages of the proposed methods.
Abstract
Despite the availability of numerous statistical and machine learning tools for joint feature modeling, many scientists investigate features marginally, i.e., one feature at a time. This is partly due to training and convention but also roots in scientists' strong interests in simple visualization and interpretability. As such, marginal feature ranking for some predictive tasks, e.g., prediction of cancer driver genes, is widely practiced in the process of scientific discoveries. In this work, we focus on marginal ranking for binary prediction, the arguably most common predictive tasks. We argue that the most widely used marginal ranking criteria, including the Pearson correlation, the two-sample t test, and two-sample Wilcoxon rank-sum test, do not fully take feature distributions and prediction objectives into account. To address this gap in practice, we propose two ranking criteria…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Machine Learning and Data Classification
