A Model Parallel Proximal Stochastic Gradient Algorithm for Partially   Asynchronous Systems

Rui Zhu; Di Niu

arXiv:1810.09270·cs.LG·October 23, 2018

A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems

Rui Zhu, Di Niu

PDF

Open Access

TL;DR

This paper introduces AsyB-ProxSGD, an asynchronous model parallel stochastic gradient algorithm designed for large-scale machine learning models, achieving convergence and near-linear speedup in distributed settings.

Contribution

It generalizes ProxSGD to asynchronous, model parallel environments and provides theoretical convergence guarantees along with practical implementation results.

Findings

01

Achieves $O(1/ oot{K})$ convergence rate for nonconvex problems.

02

Demonstrates near-linear speedup with increasing workers.

03

Validates effectiveness on real-world large-scale datasets.

Abstract

Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to many large-scale machine learning jobs. In this paper, we propose a model parallel proximal stochastic gradient algorithm, AsyB-ProxSGD, to deal with large models using model parallel blockwise updates while in the meantime handling a large amount of training data using proximal stochastic gradient descent (ProxSGD). In our algorithm, worker nodes communicate with the parameter servers asynchronously, and each worker performs proximal stochastic gradient for only one block of model parameters during each iteration. Our proposed algorithm generalizes ProxSGD to the asynchronous and model parallel setting. We prove that AsyB-ProxSGD achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research