Practical Gauss-Newton Optimisation for Deep Learning
Aleksandar Botev, Hippolyt Ritter, David Barber

TL;DR
This paper introduces an efficient block-diagonal Gauss-Newton approximation for neural network optimization, achieving competitive or superior results to first-order methods with less hyperparameter tuning.
Contribution
It proposes a novel Gauss-Newton based optimization algorithm for deep learning that is both efficient and effective, especially with default settings.
Findings
The method performs competitively with state-of-the-art optimizers.
It often shows significant improvements in optimization performance.
Piecewise linear transfer functions lack differentiable local maxima, aiding optimization.
Abstract
We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks. Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a labo- rious process, our approach can provide good performance even when used with default set- tings. A side result of our work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
