Solving FDR-Controlled Sparse Regression Problems with Five Million   Variables on a Laptop

Fabian Scheidt; Jasin Machkour; Michael Muma

arXiv:2409.19088·eess.SP·October 1, 2024·CAMSAP

Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Fabian Scheidt, Jasin Machkour, Michael Muma

PDF

TL;DR

This paper introduces Big T-Rex, an efficient implementation that enables FDR-controlled sparse regression with five million variables on a laptop by drastically reducing memory and computation requirements.

Contribution

The paper presents Big T-Rex, a memory-efficient version of the T-Rex selector, allowing large-scale FDR-controlled regression problems to be solved on standard laptops.

Findings

01

Big T-Rex reduces memory usage significantly.

02

It can solve 5 million-variable problems in 30 minutes.

03

Demonstrates practical feasibility for large-scale high-dimensional data.

Abstract

Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the repro-ducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.