Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection
Emmanuel Candes, Yingying Fan, Lucas Janson, Jinchi Lv

TL;DR
This paper introduces model-X knockoffs, a novel framework for controlling false discoveries in high-dimensional, nonlinear, and nonparametric variable selection problems, even with unknown response distributions.
Contribution
It extends the knockoff procedure to arbitrary response models by constructing probabilistic knockoffs based on known covariate distributions, enabling valid inference in complex settings.
Findings
Demonstrates superior power over existing methods in simulations
Successfully applied to Crohn's disease data, doubling discoveries
Provides a robust, general approach for high-dimensional variable selection
Abstract
Many contemporary large-scale applications involve building interpretable models linking a large set of potential covariates to a response in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively control the fraction of false discoveries even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of - knockoffs, which reads from a different perspective the knockoff procedure (Barber and Cand\`es, 2015) originally designed for controlling the false discovery rate in linear models. Whereas the knockoffs procedure is constrained to homoscedastic linear models with , the key innovation here is that model-X knockoffs provide valid inference from finite samples in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Causal Inference Techniques · Statistical Methods in Clinical Trials
