Distributed Coordinate Descent for Generalized Linear Models with Regularization
Ilya Trofimov, Alexander Genkin

TL;DR
This paper introduces a scalable distributed coordinate descent algorithm for regularized generalized linear models, effectively handling large, sparse datasets in cluster environments with proven convergence.
Contribution
It presents a novel feature-wise data splitting and coordinate descent algorithm with convergence proof, optimized for distributed environments and large-scale sparse data.
Findings
Algorithm is scalable and efficient on large datasets
Outperforms existing methods in speed and accuracy for sparse data
Addresses slow node issues with modifications
Abstract
Generalized linear model with and regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
