Nonparametric Bayesian Knockoff Generators for Feature Selection Under Complex Data Structure
Michael J. Martens, Anjishnu Banerjee, Xinran Qi, Yushu Shi

TL;DR
This paper introduces a nonparametric Bayesian approach for generating knockoff variables to improve feature selection in high-dimensional, complex data, ensuring accurate FDR control and higher power.
Contribution
It develops a novel nonparametric Bayesian model for knockoff generation that handles complex distributions, with theoretical guarantees and superior performance over existing methods.
Findings
Accurately controls FDR in simulations
Improves power over Gaussian knockoff generator
Successfully identifies predictive genes in ovarian cancer data
Abstract
The recent proliferation of high-dimensional data, such as electronic health records and genetics data, offers new opportunities to find novel predictors of outcomes. Presented with a large set of candidate features, interest often lies in selecting the ones most likely to be predictive of an outcome for further study. Controlling the false discovery rate (FDR) at a specified level is often desired in evaluating these variables. Knockoff filtering is an innovative strategy for conducting FDR-controlled feature selection. This paper proposes a nonparametric Bayesian model for generating high-quality knockoff copies that can improve the accuracy of predictive feature identification for variables arising from complex distributions, which can be skewed, highly dispersed and/or a mixture of distributions. This paper provides a detailed description for generating knockoff copies from a GDPM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Genetic Associations and Epidemiology · Gene expression and cancer classification
