Feature Selection in High-dimensional Spaces Using Graph-Based Methods
Swarnadip Ghosh, Somabha Mukherjee, Divyansh Agarwal, Yichen He,, Mingzhi Song, Xuejiao Pei

TL;DR
This paper introduces a nonparametric, graph-based feature selection method for high-dimensional data that effectively identifies influential features while controlling false discoveries, demonstrated through synthetic and real-world datasets.
Contribution
It presents a novel recursive graph-based algorithm for feature selection that works without distributional assumptions and outperforms existing methods.
Findings
Successfully recovers true features with high probability.
Outperforms existing methods on synthetic data.
Detects known and novel features in real datasets.
Abstract
High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We describe an algorithm for selecting informative features in high-dimensional data, where each observation comes from one of different distributions. Our algorithm can be applied in a completely nonparametric setup without any distributional assumptions on the data, and it aims at outputting those features in the data, that contribute the most to the overall distributional variation. At the heart of our method is the recursive application of distribution-free graph-based tests on subsets of the feature set, located at different depths of a hierarchical clustering tree constructed from the data. Our algorithm recovers all truly contributing features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Bayesian Methods and Mixture Models
