Mutual Information of Neural Network Initialisations: Mean Field Approximations
Jared Tanner, Giuseppe Ughi

TL;DR
This paper introduces an information theoretic approach to analyze neural network initializations, deriving mutual information bounds that correlate with training success and providing analytic insights via mean field approximations.
Contribution
It offers a novel information theoretic analysis of neural network initializations, complementing geometric methods with analytic mutual information bounds.
Findings
Optimal initializations also maximize mutual information.
Mutual information bounds depend on weight and bias variances.
Results align with known training-effective initializations.
Abstract
The ability to train randomly initialised deep neural networks is known to depend strongly on the variance of the weight matrices and biases as well as the choice of nonlinear activation. Here we complement the existing geometric analysis of this phenomenon with an information theoretic alternative. Lower bounds are derived for the mutual information between an input and hidden layer outputs. Using a mean field analysis we are able to provide analytic lower bounds as functions of network weight and bias variances as well as the choice of nonlinear activation. These results show that initialisations known to be optimal from a training point of view are also superior from a mutual information perspective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
