A Communication-Efficient Parallel Method for Group-Lasso
Binghong Chen, Jun Zhu

TL;DR
This paper introduces a scalable parallel algorithm, DC-gLasso, for group-Lasso that efficiently handles large datasets, accurately recovers models, and extends to overlapping groups, with proven correctness and empirical success.
Contribution
The paper proposes a novel divide-and-conquer parallel algorithm for group-Lasso that is scalable, fast, and extendable to overlapping groups, with theoretical guarantees.
Findings
DC-gLasso only needs two iterations to estimate and aggregate models.
It significantly improves computational efficiency on large datasets.
Empirical results show no loss in regression accuracy.
Abstract
Group-Lasso (gLasso) identifies important explanatory factors in predicting the response variable by considering the grouping structure over input variables. However, most existing algorithms for gLasso are not scalable to deal with large-scale datasets, which are becoming a norm in many applications. In this paper, we present a divide-and-conquer based parallel algorithm (DC-gLasso) to scale up gLasso in the tasks of regression with grouping structures. DC-gLasso only needs two iterations to collect and aggregate the local estimates on subsets of the data, and is provably correct to recover the true model under certain conditions. We further extend it to deal with overlappings between groups. Empirical results on a wide range of synthetic and real-world datasets show that DC-gLasso can significantly improve the time efficiency without sacrificing regression accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSystemic Lupus Erythematosus Research · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques
