LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, Mustafa, Ali, Ming-Yu Liu, Brucek Khailany, Bill Dally, Anima Anandkumar

TL;DR
LNS-Madam introduces a low-precision training framework using a logarithmic number system and multiplicative updates, achieving high accuracy and significant energy savings in neural network training.
Contribution
It jointly designs a logarithmic number system with a multiplicative weight update algorithm for stable low-precision neural network training.
Findings
Achieves comparable accuracy to full-precision models with 8-bit precision.
Reduces energy consumption by over 90% compared to FP32.
Provides a hardware design that enhances efficiency of LNS computations.
Abstract
Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Cryptography and Residue Arithmetic · Parallel Computing and Optimization Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · WordPiece · Dropout · Layer Normalization · Linear Warmup With Linear Decay · Residual Connection
