On Bridging the Gap between Mean Field and Finite Width in Deep Random   Neural Networks with Batch Normalization

Amir Joudaki; Hadi Daneshmand; Francis Bach

arXiv:2205.13076·cs.LG·February 22, 2023

On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization

Amir Joudaki, Hadi Daneshmand, Francis Bach

PDF

Open Access

TL;DR

This paper investigates how batch normalization stabilizes deep neural network representations, enabling mean field theory predictions to remain accurate even at infinite depth with finite width, by preventing error amplification.

Contribution

It demonstrates that batch normalization stabilizes mean field predictions in deep networks, allowing for accurate analysis at infinite depth with finite width.

Findings

01

Batch normalization prevents error propagation in deep mean field predictions.

02

Stabilization characterized by a geometric mixing property.

03

Concentration bounds established for infinitely-deep networks with finite width.

Abstract

Mean field theory is widely used in the theoretical studies of neural networks. In this paper, we analyze the role of depth in the concentration of mean-field predictions, specifically for deep multilayer perceptron (MLP) with batch normalization (BN) at initialization. By scaling the network width to infinity, it is postulated that the mean-field predictions suffer from layer-wise errors that amplify with depth. We demonstrate that BN stabilizes the distribution of representations that avoids the error propagation of mean-field predictions. This stabilization, which is characterized by a geometric mixing property, allows us to establish concentration bounds for mean field predictions in infinitely-deep neural networks with a finite width.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications

MethodsBatch Normalization