Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged   Gradient Method for Stochastic Optimization

Aaron Defazio; Samy Jelassi

arXiv:2101.11075·cs.LG·August 27, 2021·39 cites

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Aaron Defazio, Samy Jelassi

PDF

Open Access 5 Repos 3 Models

TL;DR

MADGRAD is a new adaptive gradient optimization method that consistently outperforms or matches SGD and ADAM across various deep learning tasks, demonstrating strong versatility and effectiveness.

Contribution

MADGRAD introduces a momentumized, dual-averaged adaptive gradient method that improves performance in stochastic optimization for deep learning.

Findings

01

MADGRAD outperforms SGD and ADAM on multiple deep learning tasks.

02

MADGRAD performs well on vision and NLP problems.

03

MADGRAD matches or exceeds test set performance of existing methods.

Abstract

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and image-to-image tasks in vision, and recurrent and bidirectionally-masked models in natural language processing. For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance, even on problems for which adaptive methods normally perform poorly.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsMomentumized, adaptive, dual averaged gradient · Stochastic Gradient Descent · Adam · AdaGrad