Memory Augmented Optimizers for Deep Learning
Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath, Chandar

TL;DR
This paper introduces memory-augmented gradient descent optimizers that retain limited gradient history, leading to faster convergence and better performance in deep learning tasks, with proven convergence guarantees under certain conditions.
Contribution
It proposes a novel framework for memory-augmented optimizers that selectively retain gradient history, improving efficiency and convergence in large-scale deep learning.
Findings
Accelerated convergence on vision and language tasks.
Improved performance over standard optimizers.
Proven convergence for fixed-size memory under strong convexity.
Abstract
Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or explicitly stored in memory has been shown effective in theory and practice, the question of whether or only a subset of the gradients in the history are sufficient in deciding the parameter updates remains unanswered. In this paper, we propose a framework of memory-augmented gradient descent optimizers that retain a limited view of their gradient history in their internal memory. Such optimizers scale well to large real-life datasets, and our experiments show that the memory augmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
