Quantifying uncertainty and stability among highly correlated predictors: a subspace perspective
Xiaozhu Zhang, Jacob Bien, Armeen Taeb

TL;DR
This paper introduces a subspace-based framework for feature selection in highly correlated data, providing continuous measures of similarity and stability that improve model robustness and predictive accuracy.
Contribution
It proposes a novel subspace perspective for assessing feature stability and similarity, extending existing methods to handle correlated features more effectively.
Findings
Improved stability and predictive performance on gene expression data.
Ability to identify multiple interchangeable stable models.
Framework accounts for feature correlation naturally.
Abstract
We study the problem of linear feature selection when features are highly correlated. Such settings pose two fundamental challenges. First, how should model similarity be defined? Simply counting features in common can be misleading: two models may share no features, yet highly correlated features can make the two models very similar in terms of predictive ability. Second, how can feature stability be assessed across runs of a variable selection method? High correlation can yield very different feature sets, so counting how often a feature is selected may label most features as unstable, and selecting stable features would result in models that are too small with poor predictive performance. In essence, these issues arise because existing notions of similarity and stability are "discrete" in nature. To overcome these challenges, we propose a novel framework based on feature subspaces --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Bioinformatics and Genomic Networks
