A Comprehensive and Modularized Statistical Framework for Gradient Norm   Equality in Deep Neural Networks

Zhaodong Chen; Lei Deng; Bangyan Wang; Guoqi Li; Yuan Xie

arXiv:2001.00254·cs.LG·August 13, 2020

A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks

Zhaodong Chen, Lei Deng, Bangyan Wang, Guoqi Li, Yuan Xie

PDF

1 Repo

TL;DR

This paper introduces a new modular statistical framework and a novel metric called Block Dynamical Isometry to evaluate gradient norm behavior in deep neural networks, enabling better analysis and improvements of network initialization, normalization, and architecture.

Contribution

It proposes a weak-assumption, norm-based metric and a modular framework for analyzing gradient norms in complex DNNs, along with practical normalization and initialization strategies.

Findings

01

Gradient Norm Equality is a universal principle in DNNs.

02

Second moment normalization is 30% faster than batch normalization.

03

The framework effectively analyzes diverse network components and structures.

Abstract

In recent years, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing. However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community. In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual block. Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry. To mitigate the challenging derivation, we propose a highly modularized statistical framework based on free probability. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apuaaChen/GNEDNN_release
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBatch Normalization