Analysis of Natural Gradient Descent for Multilayer Neural Networks
Magnus Rattray, David Saad

TL;DR
This paper analyzes natural gradient descent for multilayer neural networks, showing it achieves optimal long-term performance and improves training speed by reducing plateaus compared to standard gradient descent.
Contribution
It provides a theoretical analysis of natural gradient descent dynamics in large-input neural networks, highlighting its advantages over traditional gradient methods.
Findings
Natural gradient descent outperforms gradient descent in asymptotic performance.
It significantly shortens or removes training plateaus.
The analysis is based on statistical physics methods for large input dimensions.
Abstract
Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics which accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance which typically hamper gradient descent training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
