Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models
Changhu Wang, Ziheng Zhang, Jingyi Jessica Li

TL;DR
Nullstrap is a new framework for high-dimensional variable selection that controls false discovery rate without data alteration, using synthetic null data and parallel estimation to improve power and robustness.
Contribution
Nullstrap introduces a data-driven null model approach that maintains original data integrity and offers theoretical FDR control with high power in diverse high-dimensional models.
Findings
Achieves asymptotic FDR control at any level
Converges to high power in large samples
Outperforms existing methods in simulations and real data
Abstract
Balancing false discovery rate (FDR) control with high statistical power remains a central challenge in high-dimensional variable selection. While several FDR-controlling methods have been proposed, many degrade the original data -- by adding knockoff variables or splitting the data -- which often leads to substantial power loss and hampers detection of true signals. We introduce Nullstrap, a novel framework that controls FDR without altering the original data. Nullstrap generates synthetic null data by fitting a null model under the global null hypothesis that no variables are important. It then applies the same estimation procedure in parallel to both the original and synthetic data. This parallel approach mirrors that of the classical likelihood ratio test, making Nullstrap its numerical analog. By adjusting the synthetic null coefficient estimates through a data-driven correction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
