# Mean-field Analysis of Batch Normalization

**Authors:** Mingwei Wei, James Stokes, David J Schwab

arXiv: 1903.02606 · 2019-03-08

## TL;DR

This paper uses mean-field theory to analyze how Batch Normalization flattens the loss landscape of neural networks, enabling higher learning rates and faster convergence, with experimental validation of the theoretical predictions.

## Contribution

It provides a theoretical framework quantifying BatchNorm's effect on loss landscape geometry and optimal learning rates, supported by empirical results.

## Key findings

- BatchNorm flattens the loss landscape as shown by Fisher Information Matrix eigenvalues.
- Larger learning rates are justified for BatchNorm networks, matching theoretical predictions.
- Smaller BatchNorm parameters lead to lower loss after same training epochs.

## Abstract

Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence. In this paper, we use mean-field theory to analytically quantify the impact of BatchNorm on the geometry of the loss landscape for multi-layer networks consisting of fully-connected and convolutional layers. We show that it has a flattening effect on the loss landscape, as quantified by the maximum eigenvalue of the Fisher Information Matrix. These findings are then used to justify the use of larger learning rates for networks that use BatchNorm, and we provide quantitative characterization of the maximal allowable learning rate to ensure convergence. Experiments support our theoretically predicted maximum learning rate, and furthermore suggest that networks with smaller values of the BatchNorm parameter achieve lower loss after the same number of epochs of training.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.02606/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.02606/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1903.02606/full.md

---
Source: https://tomesphere.com/paper/1903.02606