Adaptive Gradient Methods with Local Guarantees

Zhou Lu; Wenhan Xia; Sanjeev Arora; Elad Hazan

arXiv:2203.01400·cs.LG·January 27, 2023

Adaptive Gradient Methods with Local Guarantees

Zhou Lu, Wenhan Xia, Sanjeev Arora, Elad Hazan

PDF

Open Access

TL;DR

This paper introduces an adaptive gradient method with local guarantees that automatically adjusts to changing data, providing robust training without manual learning rate tuning, and demonstrates strong empirical performance.

Contribution

It proposes a new adaptive gradient algorithm with provable local regret guarantees and a novel adaptive regret bound, improving online learning theory.

Findings

01

Achieves comparable accuracy to fine-tuned optimizers

02

Automatically adapts learning rates across tasks

03

Demonstrates robustness in vision and language benchmarks

Abstract

Adaptive gradient methods are the method of choice for optimization in machine learning and used to train the largest deep models. In this paper we study the problem of learning a local preconditioner, that can change as the data is changing along the optimization trajectory. We propose an adaptive gradient method that has provable adaptive regret guarantees vs. the best local preconditioner. To derive this guarantee, we prove a new adaptive regret bound in online learning that improves upon previous adaptive online learning methods. We demonstrate the robustness of our method in automatically choosing the optimal learning rate schedule for popular benchmarking tasks in vision and language domains. Without the need to manually tune a learning rate schedule, our method can, in a single run, achieve comparable and stable task accuracy as a fine-tuned optimizer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques