Novel Knockoff Generation and Importance Measures with Heterogeneous Data via Conditional Residuals and Local Gradients
Evan Mason, Zhe Fei

TL;DR
This paper introduces a flexible knockoff generation method for heterogeneous data and a new importance measure, MALD, suitable for complex models, improving variable selection accuracy and interpretability.
Contribution
It develops a distribution-free knockoff framework using conditional residuals and introduces MALD, a local gradient-based importance measure for nonlinear models.
Findings
Better false discovery rate control in simulations
Higher power in variable selection tasks
Successful application to DNA methylation data
Abstract
Knockoff variable selection is a powerful framework that creates synthetic knockoff variables to mirror the correlation structure of the observed features, enabling principled control of the false discovery rate in variable selection. However, existing methods often assume homogeneous data types or known distributions, limiting their applicability in real-world settings with heterogeneous, distribution-free data. Moreover, common variable importance measures rely on linear outcome models, hindering their effectiveness for complex relationships. We propose a flexible knockoff generation framework based on conditional residuals that accommodates mixed data types without assuming known distributions. To assess variable importance, we introduce the Mean Absolute Local Derivative (MALD), an interpretable metric compatible with nonlinear outcome functions, including random forests and neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods
