TL;DR
This book develops a comprehensive theoretical framework for understanding deep neural networks, explaining their learning dynamics, representation capabilities, and universal behaviors through a principled, first-principles approach.
Contribution
It introduces the notion of representation group flow and analyzes the impact of depth-to-width ratio on network behavior and complexity, providing new insights into deep learning mechanisms.
Findings
Networks' predictions are nearly-Gaussian, with deviations controlled by depth-to-width ratio.
Tuning networks to criticality addresses the exploding and vanishing gradient problem.
Residual connections extend the effective depth of networks.
Abstract
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets· youtube
