Stochastic modified equations for the asynchronous stochastic gradient   descent

Jing An; Jianfeng Lu; Lexing Ying

arXiv:1805.08244·stat.ML·March 4, 2020

Stochastic modified equations for the asynchronous stochastic gradient descent

Jing An, Jianfeng Lu, Lexing Ying

PDF

TL;DR

This paper introduces a stochastic modified equation framework to model and analyze asynchronous stochastic gradient descent, providing insights into its dynamics and enabling the development of optimal mini-batching strategies.

Contribution

It develops a Langevin-type SME for ASGD, linking different stochastic gradient algorithms and enabling precise trajectory predictions and optimization.

Findings

01

SME accurately predicts ASGD trajectories

02

Convergence of ASGD to the SME established

03

Optimal mini-batching strategy derived from SME

Abstract

We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's precise prediction to the trajectories of ASGD with various forcing terms. As an application of the SME, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.