Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

TL;DR
This paper uncovers universal statistical properties of the Fisher Information Matrix in deep neural networks using mean field theory, revealing eigenvalue behaviors that influence generalization and optimization strategies.
Contribution
It introduces a mean field approach to analyze the Fisher Information Matrix in wide neural networks, revealing universal eigenvalue statistics and their implications.
Findings
Most FIM eigenvalues are near zero, indicating flat directions.
The maximum eigenvalue is significantly large, indicating sharp directions.
Derived statistics can inform learning rate selection and generalization measures.
Abstract
The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statistics of the FIM's eigenvalues and reveal that most of them are close to zero while the maximum eigenvalue takes a huge value. Because the landscape of the parameter space is defined by the FIM, it is locally flat in most dimensions, but strongly distorted in others. Moreover, we demonstrate the potential usage of the derived statistics in learning strategies. First, small eigenvalues that induce flatness can be connected to a norm-based capacity measure of generalization ability. Second, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Statistical Mechanics and Entropy · Machine Learning and ELM
