Learning to Accelerate by the Methods of Step-size Planning

Hengshuai Yao

arXiv:2204.01705·cs.LG·May 27, 2022

Learning to Accelerate by the Methods of Step-size Planning

Hengshuai Yao

PDF

Open Access

TL;DR

This paper introduces step-size planning methods that leverage past gradient update experiences to learn and predict optimal step-sizes, significantly accelerating convergence for both convex and non-convex problems.

Contribution

It proposes a novel class of step-size planning algorithms, including Csawg, that outperform existing methods and surpass theoretical convergence limits in certain scenarios.

Findings

01

Csawg achieves faster convergence than Nesterov's method on convex problems.

02

Planning methods reach zero error on Rosenbrock function with fewer evaluations.

03

The approach extends to multi-step planning, further improving speed.

Abstract

Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques

MethodsAdam