Knockoffs for the mass: new feature importance statistics with false discovery guarantees
Jaime Roquero Gimenez, Amirata Ghorbani, James Zou

TL;DR
This paper advances the use of knockoff procedures in feature selection by developing efficient algorithms for generating valid knockoffs from Bayesian Networks and introducing new test statistics with improved power, backed by theoretical guarantees and extensive experiments.
Contribution
It introduces a novel efficient algorithm for generating knockoffs from Bayesian Networks and proposes new test statistics that enhance power while maintaining false discovery control.
Findings
New algorithm for Bayesian Network knockoffs
Improved feature importance test statistics
Validated methods on real and synthetic data
Abstract
An important problem in machine learning and statistics is to identify features that causally affect the outcome. This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features. For example, we want to identify that smoking really is correlated with cancer conditioned on demographics. The knockoff procedure is a recent breakthrough in statistics that, in theory, can identify truly correlated features while guaranteeing that the false discovery is limited. The idea is to create synthetic data -- knockoffs -- that captures correlations amongst the features. However there are substantial computational and practical challenges to generating and using knockoffs. This paper makes several key advances that enable knockoff application to be more efficient and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
