Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian   Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

Greg Yang

arXiv:1902.04760·cs.NE·April 7, 2020·187 cites

Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

Greg Yang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a unified framework for analyzing wide neural networks with weight sharing, establishing their convergence to Gaussian processes, conditions for gradient independence, and the behavior of the Neural Tangent Kernel across various architectures.

Contribution

The authors develop a tensor program framework that characterizes the scaling limits of wide neural networks, unifying many existing results and deriving new insights into their training dynamics.

Findings

01

Neural networks converge to Gaussian processes in various architectures.

02

Gradient independence assumption can be validated or corrected.

03

Neural Tangent Kernel converges at initialization for diverse architectures.

Abstract

Several recent trends in machine learning theory and practice, from the design of state-of-the-art Gaussian Process to the convergence analysis of deep neural nets (DNNs) under stochastic gradient descent (SGD), have found it fruitful to study wide random neural networks. Central to these approaches are certain scaling limits of such networks. We unify these results by introducing a notion of a straightline \emph{tensor program} that can express most neural network computations, and we characterize its scaling limit when its tensors are large and randomized. From our framework follows (1) the convergence of random neural networks to Gaussian processes for architectures such as recurrent neural networks, convolutional neural networks, residual networks, attention, and any combination thereof, with or without batch normalization; (2) conditions under which the \emph{gradient independence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications

MethodsGaussian Process · Stochastic Gradient Descent