Norm matters: efficient and accurate normalization schemes in deep   networks

Elad Hoffer; Ron Banner; Itay Golan; Daniel Soudry

arXiv:1803.01814·stat.ML·February 8, 2019·52 cites

Norm matters: efficient and accurate normalization schemes in deep networks

Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry

PDF

Open Access 4 Repos

TL;DR

This paper offers a new perspective on normalization in deep networks, proposing alternative schemes like L1 and L-infinity normalization that enhance stability and efficiency, especially in low-precision settings.

Contribution

It introduces novel normalization methods in L1 and L-infinity spaces, connecting normalization, weight decay, and learning rate, and improves weight normalization for large-scale tasks.

Findings

01

L1 and L-infinity normalization improve numerical stability

02

Proposed methods enable batch-norm alternatives in half-precision

03

Modified weight normalization enhances large-scale task performance

Abstract

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. This property highlights the connection between practices such as normalization, weight decay and learning-rate adjustments. We suggest several alternatives to the widely used $L^{2}$ batch-norm, using normalization in $L^{1}$ and $L^{\infty}$ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsWeight Decay