Continuous-time Models for Stochastic Optimization Algorithms

Antonio Orvieto; Aurelien Lucchi

arXiv:1810.02565·math.OC·March 12, 2020

Continuous-time Models for Stochastic Optimization Algorithms

Antonio Orvieto, Aurelien Lucchi

PDF

Open Access 1 Repo

TL;DR

This paper introduces continuous-time models for stochastic optimization algorithms, providing new insights into their convergence behavior and dynamics, especially for non-convex functions, using stochastic calculus and Lyapunov analysis.

Contribution

It develops novel continuous-time formulations for stochastic algorithms and applies stochastic calculus to analyze their convergence and dynamics, bridging discrete and continuous perspectives.

Findings

01

Derived convergence bounds for non-convex functions.

02

Showed that decreasing learning rate acts as landscape stretching.

03

Matched rates between continuous-time models and discrete algorithms.

Abstract

We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aorvieto/SGD-SVRG-models
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent