A flexible model-free prediction-based framework for feature ranking

Jingyi Jessica Li; Yiling Chen; and Xin Tong

arXiv:1903.05262·stat.ME·December 1, 2021·J. Mach. Learn. Res.·1 cites

A flexible model-free prediction-based framework for feature ranking

Jingyi Jessica Li, Yiling Chen, and Xin Tong

PDF

Open Access 1 Repo

TL;DR

This paper introduces two model-free, prediction-objective-based criteria for marginal feature ranking, addressing limitations of traditional methods by considering feature distributions and prediction goals, with proven consistency and robustness.

Contribution

It proposes novel nonparametric ranking criteria aligned with prediction objectives, improving feature ranking accuracy and robustness, especially in biomedical research with sampling bias.

Findings

01

Both criteria achieve high-probability sample-level consistency.

02

NPC demonstrates robustness to sampling bias.

03

Simulation and real data validate the advantages of the proposed methods.

Abstract

Despite the availability of numerous statistical and machine learning tools for joint feature modeling, many scientists investigate features marginally, i.e., one feature at a time. This is partly due to training and convention but also roots in scientists' strong interests in simple visualization and interpretability. As such, marginal feature ranking for some predictive tasks, e.g., prediction of cancer driver genes, is widely practiced in the process of scientific discoveries. In this work, we focus on marginal ranking for binary prediction, the arguably most common predictive tasks. We argue that the most widely used marginal ranking criteria, including the Pearson correlation, the two-sample t test, and two-sample Wilcoxon rank-sum test, do not fully take feature distributions and prediction objectives into account. To address this gap in practice, we propose two ranking criteria…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JSB-UCLA/frc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Statistical Methods and Inference · Machine Learning and Data Classification