Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models

Changhu Wang; Ziheng Zhang; Jingyi Jessica Li

arXiv:2501.05012·stat.ME·July 16, 2025

Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models

Changhu Wang, Ziheng Zhang, Jingyi Jessica Li

PDF

Open Access

TL;DR

Nullstrap is a new framework for high-dimensional variable selection that controls false discovery rate without data alteration, using synthetic null data and parallel estimation to improve power and robustness.

Contribution

Nullstrap introduces a data-driven null model approach that maintains original data integrity and offers theoretical FDR control with high power in diverse high-dimensional models.

Findings

01

Achieves asymptotic FDR control at any level

02

Converges to high power in large samples

03

Outperforms existing methods in simulations and real data

Abstract

Balancing false discovery rate (FDR) control with high statistical power remains a central challenge in high-dimensional variable selection. While several FDR-controlling methods have been proposed, many degrade the original data -- by adding knockoff variables or splitting the data -- which often leads to substantial power loss and hampers detection of true signals. We introduce Nullstrap, a novel framework that controls FDR without altering the original data. Nullstrap generates synthetic null data by fitting a null model under the global null hypothesis that no variables are important. It then applies the same estimation procedure in parallel to both the original and synthetic data. This parallel approach mirrors that of the classical likelihood ratio test, making Nullstrap its numerical analog. By adjusting the synthetic null coefficient estimates through a data-driven correction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems