Memory Augmented Optimizers for Deep Learning

Paul-Aymeric McRae; Prasanna Parthasarathi; Mahmoud Assran; Sarath; Chandar

arXiv:2106.10708·cs.LG·June 22, 2021·1 cites

Memory Augmented Optimizers for Deep Learning

Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath, Chandar

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces memory-augmented gradient descent optimizers that retain limited gradient history, leading to faster convergence and better performance in deep learning tasks, with proven convergence guarantees under certain conditions.

Contribution

It proposes a novel framework for memory-augmented optimizers that selectively retain gradient history, improving efficiency and convergence in large-scale deep learning.

Findings

01

Accelerated convergence on vision and language tasks.

02

Improved performance over standard optimizers.

03

Proven convergence for fixed-size memory under strong convexity.

Abstract

Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or explicitly stored in memory has been shown effective in theory and practice, the question of whether $a l l$ or only a subset of the gradients in the history are sufficient in deciding the parameter updates remains unanswered. In this paper, we propose a framework of memory-augmented gradient descent optimizers that retain a limited view of their gradient history in their internal memory. Such optimizers scale well to large real-life datasets, and our experiments show that the memory augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Memory Augmented Optimizers for Deep Learning· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM