A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer   Neural Networks

Zhengdao Chen; Eric Vanden-Eijnden; Joan Bruna

arXiv:2210.16286·cs.LG·October 31, 2022

A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks

Zhengdao Chen, Eric Vanden-Eijnden, Joan Bruna

PDF

Open Access

TL;DR

This paper develops a mean-field theory for a partially-trained three-layer neural network with a fixed first layer, showing linear decay of training loss and feature learning in the infinite-width limit.

Contribution

It generalizes mean-field theory to three-layer networks with a fixed first layer, introducing a functional-space perspective and analyzing training dynamics.

Findings

01

Training loss decays linearly to zero in the mean-field limit.

02

The theory captures feature learning in different scaling regimes.

03

Provides Rademacher complexity bounds for the solution space.

Abstract

To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical guarantees of its convergence under gradient flow training as well as its approximation and generalization capabilities. In this work, we study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. To define the limiting model rigorously, we generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces. Then, by writing the MF training dynamics as a kernel gradient flow with a time-varying kernel that remains positive-definite, we prove that its training loss in $L_{2}$ regression decays to zero at a linear rate. Furthermore, we define function spaces that include the solutions obtainable through the MF training dynamics and prove…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques