L1-Norm Batch Normalization for Efficient Training of Deep Neural   Networks

Shuang Wu; Guoqi Li; Lei Deng; Liu Liu; Yuan Xie; Luping Shi

arXiv:1802.09769·cs.LG·May 23, 2019

L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks

Shuang Wu, Guoqi Li, Lei Deng, Liu Liu, Yuan Xie, Luping Shi

PDF

TL;DR

This paper introduces L1-Norm Batch Normalization (L1BN), a linear operation-based normalization method that accelerates training, reduces power consumption, and facilitates low-bit quantization in deep neural networks.

Contribution

L1BN replaces nonlinear operations in batch normalization with linear ones, maintaining accuracy while improving computational efficiency and hardware friendliness.

Findings

01

L1BN achieves similar accuracy and convergence as L2BN.

02

On FPGA, L1BN speeds up training by 1.5x and reduces power consumption by 50%.

03

L1BN enables fully quantized DNN training for future hardware deployment.

Abstract

Batch Normalization (BN) has been proven to be quite effective at accelerating and improving the training of deep neural networks (DNNs). However, BN brings additional computation, consumes more memory and generally slows down the training process by a large margin, which aggravates the training effort. Furthermore, the nonlinear square and root operations in BN also impede the low bit-width quantization techniques, which draws much attention in deep learning hardware community. In this work, we propose an L1-norm BN (L1BN) with only linear operations in both the forward and the backward propagations during training. L1BN is shown to be approximately equivalent to the original L2-norm BN (L2BN) by multiplying a scaling factor. Experiments on various convolutional neural networks (CNNs) and generative adversarial networks (GANs) reveal that L1BN maintains almost the same accuracies and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.