Fisher Information and Natural Gradient Learning of Random Deep Networks
Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

TL;DR
This paper analyzes the Fisher information matrix in random deep neural networks, showing it is block-diagonal and deriving an explicit inverse, enabling faster natural gradient learning without matrix inversion.
Contribution
It provides a theoretical analysis of the Fisher information matrix in random neural networks, justifying quasi-diagonal approximations and deriving explicit formulas for natural gradient computation.
Findings
Fisher information matrix is block-diagonal in random networks.
Explicit inverse of Fisher matrix for single units is derived.
Natural gradient can be computed efficiently without matrix inversion.
Abstract
A deep neural network is a hierarchical nonlinear model transforming input signals to output signals. Its input-output relation is considered to be stochastic, being described for a given input by a parameterized conditional probability distribution of outputs. The space of parameters consisting of weights and biases is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, so it is effective in learning, avoiding plateaus. It requires inversion of the Fisher information matrix, however, which is practically impossible when the matrix has a huge number of dimensions. Many methods for approximating the natural gradient have therefore been introduced. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
