Exact Gauss-Newton Optimization for Training Deep Neural Networks

Mikalai Korbit; Adeyemi D. Adeoye; Alberto Bemporad; Mario Zanon

arXiv:2405.14402·cs.LG·October 16, 2025

Exact Gauss-Newton Optimization for Training Deep Neural Networks

Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon

PDF

Open Access 1 Repo

TL;DR

This paper introduces Exact Gauss-Newton (EGN), a second-order optimization algorithm for deep neural networks that efficiently computes descent directions using low-rank approximations, showing competitive performance with existing optimizers.

Contribution

The paper proposes EGN, a novel stochastic second-order method combining Gauss-Newton approximation with low-rank algebra, and proves its convergence under mild conditions.

Findings

01

EGN outperforms or matches popular optimizers across tasks.

02

EGN efficiently handles large-scale models with small batch sizes.

03

Incorporating line search and momentum accelerates convergence.

Abstract

We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cor3bit/somax
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Inertial Sensor and Navigation · Statistical and numerical algorithms

MethodsStochastic Gradient Descent · Adam