Non-asymptotic approximations of neural networks by Gaussian processes

Ronen Eldan; Dan Mikulincer; Tselil Schramm

arXiv:2102.08668·math.PR·February 18, 2021·5 cites

Non-asymptotic approximations of neural networks by Gaussian processes

Ronen Eldan, Dan Mikulincer, Tselil Schramm

PDF

Open Access

TL;DR

This paper quantifies how wide neural networks can be approximated by Gaussian processes, providing explicit convergence rates based on activation function properties, thus bridging neural network theory and Gaussian process models.

Contribution

It establishes explicit convergence rates for neural networks to Gaussian processes, depending on activation function polynomial degree or smoothness, advancing theoretical understanding.

Findings

01

Convergence rate depends on activation polynomial degree or smoothness.

02

Explicit rates are provided in an infinite-dimensional functional space.

03

Results apply to wide neural networks with random initialization.

Abstract

We study the extent to which wide neural networks may be approximated by Gaussian processes when initialized with random weights. It is a well-established fact that as the width of a network goes to infinity, its law converges to that of a Gaussian process. We make this quantitative by establishing explicit convergence rates for the central limit theorem in an infinite-dimensional functional space, metrized with a natural transportation distance. We identify two regimes of interest; when the activation function is polynomial, its degree determines the rate of convergence, while for non-polynomial activations, the rate is governed by the smoothness of the function.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Neural Networks and Applications · Stochastic Gradient Optimization Techniques