Gradient descent with generalized Newton's method

Zhiqi Bu; Shiyun Xu

arXiv:2407.02772·cs.LG·May 20, 2025

Gradient descent with generalized Newton's method

Zhiqi Bu, Shiyun Xu

PDF

Open Access 1 Repo

TL;DR

The paper introduces the generalized Newton's method (GeN), a Hessian-informed optimizer that automatically adjusts learning rates for faster convergence without extensive tuning, applicable to various optimizers like SGD and Adam.

Contribution

It presents a new optimizer, GeN, that dynamically selects learning rates, improving convergence speed and ease of implementation across different models and tasks.

Findings

01

GeN matches state-of-the-art performance on language and vision tasks.

02

GeN requires minimal additional computation and no extensive tuning.

03

Experiments demonstrate GeN's effectiveness across GPT and ResNet models.

Abstract

We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, our method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shiyunxu/autogen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques · Iterative Methods for Nonlinear Equations · Reservoir Engineering and Simulation Methods

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Warmup With Cosine Annealing · Linear Layer · Attention Dropout · Dropout · Dense Connections