Analysis of Natural Gradient Descent for Multilayer Neural Networks

Magnus Rattray; David Saad

arXiv:cond-mat/9901212·cond-mat.dis-nn·October 31, 2009

Analysis of Natural Gradient Descent for Multilayer Neural Networks

Magnus Rattray, David Saad

PDF

TL;DR

This paper analyzes natural gradient descent for multilayer neural networks, showing it achieves optimal long-term performance and improves training speed by reducing plateaus compared to standard gradient descent.

Contribution

It provides a theoretical analysis of natural gradient descent dynamics in large-input neural networks, highlighting its advantages over traditional gradient methods.

Findings

01

Natural gradient descent outperforms gradient descent in asymptotic performance.

02

It significantly shortens or removes training plateaus.

03

The analysis is based on statistical physics methods for large input dimensions.

Abstract

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics which accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance which typically hamper gradient descent training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.