Doing the impossible: Why neural networks can be trained at all
Nathan O. Hodas, Panos Stinis

TL;DR
This paper investigates why deep neural networks can be trained effectively despite their enormous size, revealing that they operate on low-dimensional manifolds facilitated by high mutual information between layers, which accelerates training.
Contribution
The study introduces the use of mutual information between layers to explain neural network training success and proposes adding structure to enhance this mutual information for faster, more accurate training.
Findings
High mutual information reduces effective parameters
Structured networks with higher mutual information train faster
Neural networks operate on low-dimensional manifolds
Abstract
As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
