Exact Gauss-Newton Optimization for Training Deep Neural Networks
Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon

TL;DR
This paper introduces Exact Gauss-Newton (EGN), a second-order optimization algorithm for deep neural networks that efficiently computes descent directions using low-rank approximations, showing competitive performance with existing optimizers.
Contribution
The paper proposes EGN, a novel stochastic second-order method combining Gauss-Newton approximation with low-rank algebra, and proves its convergence under mild conditions.
Findings
EGN outperforms or matches popular optimizers across tasks.
EGN efficiently handles large-scale models with small batch sizes.
Incorporating line search and momentum accelerates convergence.
Abstract
We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Inertial Sensor and Navigation · Statistical and numerical algorithms
MethodsStochastic Gradient Descent · Adam
