Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop
Fabian Scheidt, Jasin Machkour, Michael Muma

TL;DR
This paper introduces Big T-Rex, an efficient implementation that enables FDR-controlled sparse regression with five million variables on a laptop by drastically reducing memory and computation requirements.
Contribution
The paper presents Big T-Rex, a memory-efficient version of the T-Rex selector, allowing large-scale FDR-controlled regression problems to be solved on standard laptops.
Findings
Big T-Rex reduces memory usage significantly.
It can solve 5 million-variable problems in 30 minutes.
Demonstrates practical feasibility for large-scale high-dimensional data.
Abstract
Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the repro-ducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
