The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R
Yaohui Zeng, Patrick Breheny

TL;DR
The biglasso R package enables efficient lasso model fitting on massive high-dimensional data sets by using memory-mapped files and advanced screening rules, surpassing existing tools in memory and computation efficiency.
Contribution
Introduces biglasso, an R package that handles ultrahigh-dimensional data with out-of-core computation and improved screening rules, filling a gap in existing software.
Findings
biglasso outperforms glmnet in memory and speed.
Successfully analyzes a 31 GB data set on a 16 GB RAM laptop.
Demonstrates effective out-of-core computation for massive data.
Abstract
Penalized regression models such as the lasso have been extensively applied to analyzing high-dimensional data sets. However, due to memory limitations, existing R packages like glmnet and ncvreg are not capable of fitting lasso-type models for ultrahigh-dimensional, multi-gigabyte data sets that are increasingly seen in many areas such as genetics, genomics, biomedical imaging, and high-frequency finance. In this research, we implement an R package called biglasso that tackles this challenge. biglasso utilizes memory-mapped files to store the massive data on the disk, only reading data into memory when necessary during model fitting, and is thus able to handle out-of-core computation seamlessly. Moreover, it's equipped with newly proposed, more efficient feature screening rules, which substantially accelerate the computation. Benchmarking experiments show that our biglasso package, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
