LNS-Madam: Low-Precision Training in Logarithmic Number System using   Multiplicative Weight Update

Jiawei Zhao; Steve Dai; Rangharajan Venkatesan; Brian Zimmer; Mustafa; Ali; Ming-Yu Liu; Brucek Khailany; Bill Dally; Anima Anandkumar

arXiv:2106.13914·cs.LG·August 24, 2022

LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, Mustafa, Ali, Ming-Yu Liu, Brucek Khailany, Bill Dally, Anima Anandkumar

PDF

Open Access

TL;DR

LNS-Madam introduces a low-precision training framework using a logarithmic number system and multiplicative updates, achieving high accuracy and significant energy savings in neural network training.

Contribution

It jointly designs a logarithmic number system with a multiplicative weight update algorithm for stable low-precision neural network training.

Findings

01

Achieves comparable accuracy to full-precision models with 8-bit precision.

02

Reduces energy consumption by over 90% compared to FP32.

03

Provides a hardware design that enhances efficiency of LNS computations.

Abstract

Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Cryptography and Residue Arithmetic · Parallel Computing and Optimization Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · WordPiece · Dropout · Layer Normalization · Linear Warmup With Linear Decay · Residual Connection