Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Greg Yang

TL;DR
This paper introduces a unified framework for analyzing wide neural networks with weight sharing, establishing their convergence to Gaussian processes, conditions for gradient independence, and the behavior of the Neural Tangent Kernel across various architectures.
Contribution
The authors develop a tensor program framework that characterizes the scaling limits of wide neural networks, unifying many existing results and deriving new insights into their training dynamics.
Findings
Neural networks converge to Gaussian processes in various architectures.
Gradient independence assumption can be validated or corrected.
Neural Tangent Kernel converges at initialization for diverse architectures.
Abstract
Several recent trends in machine learning theory and practice, from the design of state-of-the-art Gaussian Process to the convergence analysis of deep neural nets (DNNs) under stochastic gradient descent (SGD), have found it fruitful to study wide random neural networks. Central to these approaches are certain scaling limits of such networks. We unify these results by introducing a notion of a straightline \emph{tensor program} that can express most neural network computations, and we characterize its scaling limit when its tensors are large and randomized. From our framework follows (1) the convergence of random neural networks to Gaussian processes for architectures such as recurrent neural networks, convolutional neural networks, residual networks, attention, and any combination thereof, with or without batch normalization; (2) conditions under which the \emph{gradient independence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
MethodsGaussian Process · Stochastic Gradient Descent
