Doing the impossible: Why neural networks can be trained at all

Nathan O. Hodas; Panos Stinis

arXiv:1805.04928·cs.LG·May 29, 2018

Doing the impossible: Why neural networks can be trained at all

Nathan O. Hodas, Panos Stinis

PDF

TL;DR

This paper investigates why deep neural networks can be trained effectively despite their enormous size, revealing that they operate on low-dimensional manifolds facilitated by high mutual information between layers, which accelerates training.

Contribution

The study introduces the use of mutual information between layers to explain neural network training success and proposes adding structure to enhance this mutual information for faster, more accurate training.

Findings

01

High mutual information reduces effective parameters

02

Structured networks with higher mutual information train faster

03

Neural networks operate on low-dimensional manifolds

Abstract

As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.