A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems
Rui Zhu, Di Niu

TL;DR
This paper introduces AsyB-ProxSGD, an asynchronous model parallel stochastic gradient algorithm designed for large-scale machine learning models, achieving convergence and near-linear speedup in distributed settings.
Contribution
It generalizes ProxSGD to asynchronous, model parallel environments and provides theoretical convergence guarantees along with practical implementation results.
Findings
Achieves $O(1/ oot{K})$ convergence rate for nonconvex problems.
Demonstrates near-linear speedup with increasing workers.
Validates effectiveness on real-world large-scale datasets.
Abstract
Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to many large-scale machine learning jobs. In this paper, we propose a model parallel proximal stochastic gradient algorithm, AsyB-ProxSGD, to deal with large models using model parallel blockwise updates while in the meantime handling a large amount of training data using proximal stochastic gradient descent (ProxSGD). In our algorithm, worker nodes communicate with the parameter servers asynchronously, and each worker performs proximal stochastic gradient for only one block of model parameters during each iteration. Our proposed algorithm generalizes ProxSGD to the asynchronous and model parallel setting. We prove that AsyB-ProxSGD achieves a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research
