The Role of Memory in Stochastic Optimization

Antonio Orvieto; Jonas Kohler; Aurelien Lucchi

arXiv:1907.01678·cs.LG·March 13, 2020·6 cites

The Role of Memory in Stochastic Optimization

Antonio Orvieto, Jonas Kohler, Aurelien Lucchi

PDF

Open Access

TL;DR

This paper uses stochastic differential equations to analyze how different memory mechanisms in gradient-based optimization algorithms affect convergence, stability, and performance in convex and nonconvex stochastic settings.

Contribution

It introduces a continuous-time model for arbitrary memory types, derives convergence guarantees, and proposes a flexible discretized algorithm with improved stability over classical momentum methods.

Findings

01

Memory choice significantly impacts convergence and stability.

02

The proposed algorithm outperforms classical momentum in stochastic convex optimization.

03

Long-term memory improves second-moment estimation in adaptive methods like Adam.

Abstract

The choice of how to retain information about past gradients dramatically affects the convergence properties of state-of-the-art stochastic optimization methods, such as Heavy-ball, Nesterov's momentum, RMSprop and Adam. Building on this observation, we use stochastic differential equations (SDEs) to explicitly study the role of memory in gradient-based algorithms. We first derive a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings. We provide convergence guarantees for this SDE for weakly-quasi-convex and quadratically growing functions. We then demonstrate how to discretize this SDE to get a flexible discrete-time algorithm that can implement a board spectrum of memories ranging from short- to long-term. Not only does this algorithm increase the degrees of freedom in algorithmic choice for practitioners but it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research

MethodsRMSProp · Adam