A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent
Yongqiang Cai, Qianxiao Li, Zuowei Shen

TL;DR
This paper provides a quantitative analysis of how batch normalization improves the convergence and stability of gradient descent, especially in simple linear models, and confirms these effects in more complex settings.
Contribution
It offers the first quantitative analysis of batch normalization's effects on gradient descent convergence and stability, highlighting its acceleration mechanisms and robustness.
Findings
BNG D converges for arbitrary learning rates unlike GD
BNG D's convergence remains linear under mild conditions
Over-parameterization and large learning rate ranges accelerate BNG D
Abstract
Despite its empirical success and recent theoretical progress, there generally lacks a quantitative analysis of the effect of batch normalization (BN) on the convergence and stability of gradient descent. In this paper, we provide such an analysis on the simple problem of ordinary least squares (OLS). Since precise dynamical properties of gradient descent (GD) is completely known for the OLS problem, it allows us to isolate and compare the additional effects of BN. More precisely, we show that unlike GD, gradient descent with BN (BNGD) converges for arbitrary learning rates for the weights, and the convergence remains linear under mild conditions. Moreover, we quantify two different sources of acceleration of BNGD over GD -- one due to over-parameterization which improves the effective condition number and another due having a large range of learning rates giving rise to fast descent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
MethodsBatch Normalization
