Efficient Data Reduction Strategies for Big Data and High-Dimensional LASSO Regressions
Xin Wang, Min Yang, and William Li

TL;DR
This paper extends data reduction strategies to high-dimensional LASSO regressions, proposing methods that improve efficiency and accuracy for large n and p datasets through variable and data point reduction.
Contribution
It introduces a two-step reduction approach for big data LASSO problems, combining variable and data point reduction with theoretical guarantees.
Findings
Proposed algorithms outperform existing methods in accuracy and speed.
Theoretical results support the applicability of IBOSS-like methods to LASSO.
Simulation studies demonstrate superior performance in large-scale settings.
Abstract
The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many practical problems, p is very large and penalty-based model fitting methods such as LASSO is used. We study the big data problems, in which both n and p are large. In the first part, we focus on reduction in data points. We develop theoretical results showing that the IBOSS type of approach can be applicable to penalty-based regressions such as LASSO. In the second part, we consider the situations where p is extremely large. We propose a two-step approach that involves first reducing the number of variables and then reducing the number of data points. Two separate algorithms are developed, whose performances are studied through extensive simulation studies.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Machine Learning and Data Classification · Statistical Methods and Inference
