TL;DR
This paper introduces a new modular statistical framework and a novel metric called Block Dynamical Isometry to evaluate gradient norm behavior in deep neural networks, enabling better analysis and improvements of network initialization, normalization, and architecture.
Contribution
It proposes a weak-assumption, norm-based metric and a modular framework for analyzing gradient norms in complex DNNs, along with practical normalization and initialization strategies.
Findings
Gradient Norm Equality is a universal principle in DNNs.
Second moment normalization is 30% faster than batch normalization.
The framework effectively analyzes diverse network components and structures.
Abstract
In recent years, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing. However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community. In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual block. Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry. To mitigate the challenging derivation, we propose a highly modularized statistical framework based on free probability. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization
