Virtual Dummies: Enabling Scalable FDR-Controlled Variable Selection via Sequential Sampling of Null Features
Taulant Koka, Jasin Machkour, Daniel P. Palomar, Michael Muma

TL;DR
This paper introduces Virtual Dummy LARS, a scalable method for FDR-controlled variable selection in high-dimensional genomics, reducing memory and computation while maintaining theoretical guarantees.
Contribution
It formalizes the information flow in dummy-based selection, deriving an adaptive sampling method that eliminates the need for large dummy matrices, enabling scalable FDR control.
Findings
VD-LARS reduces memory and runtime by several orders of magnitude.
The method preserves the exact FDR control and selection law of the original T-Rex selector.
Experiments show VD-T-Rex effectively controls FDR and maintains power in large-scale genome studies.
Abstract
High-dimensional variable selection, particularly in genomics, requires error-controlling procedures that scale to millions of predictors. The Terminating-Random Experiments (T-Rex) selector achieves false discovery rate (FDR) control by aggregating results of early terminated random experiments, each combining original predictors with i.i.d. synthetic null variables (dummies). At biobank scales, however, explicit dummy augmentation requires terabytes of memory. We demonstrate that this bottleneck is not fundamental. Formalizing the information flow of forward selection through a filtration, we show that compatible selectors interact with unselected dummies solely through projections onto an adaptively evolving low-dimensional subspace. For rotationally invariant dummy distributions, we derive an adaptive stick-breaking construction sampling these projections from their exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
