Practical Gauss-Newton Optimisation for Deep Learning

Aleksandar Botev; Hippolyt Ritter; David Barber

arXiv:1706.03662·stat.ML·June 14, 2017·ICML·34 cites

Practical Gauss-Newton Optimisation for Deep Learning

Aleksandar Botev, Hippolyt Ritter, David Barber

PDF

Open Access

TL;DR

This paper introduces an efficient block-diagonal Gauss-Newton approximation for neural network optimization, achieving competitive or superior results to first-order methods with less hyperparameter tuning.

Contribution

It proposes a novel Gauss-Newton based optimization algorithm for deep learning that is both efficient and effective, especially with default settings.

Findings

01

The method performs competitively with state-of-the-art optimizers.

02

It often shows significant improvements in optimization performance.

03

Piecewise linear transfer functions lack differentiable local maxima, aiding optimization.

Abstract

We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks. Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a labo- rious process, our approach can provide good performance even when used with default set- tings. A side result of our work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Neural Networks and Applications · Model Reduction and Neural Networks