A Power-Efficient Hardware Implementation of L-Mul
Ruiqi Chen, Yangxintong Lyu, Han Bao, Bruno da Silva

TL;DR
This paper introduces a power-efficient FPGA-based approximate FP8 multiplier for the L-Mul algorithm, reducing energy consumption in neural network computations while maintaining accuracy.
Contribution
It presents the first hardware implementation of L-Mul, leveraging FPGA primitives to create an energy-efficient approximate FP8 multiplier for neural networks.
Findings
Achieves reduced energy consumption compared to traditional multipliers.
Maintains acceptable accuracy in neural network inference tasks.
Demonstrates effective deployment in large language model inference.
Abstract
Multiplication is a core operation in modern neural network (NN) computations, contributing significantly to energy consumption. The linear-complexity multiplication (L-Mul) algorithm is specifically proposed as an approximate multiplication method for emerging NN models, such as large language model (LLM), to reduce the energy consumption and computational complexity of multiplications. However, hardware implementation designs for L-Mul have not yet been reported. Additionally, 8-bit floating-point (FP8), as an emerging data format, offers a better dynamic range compared to traditional 8-bit integer (INT8), making it increasingly popular and widely adopted in NN computations. This paper thus presents a power-efficient FPGAbased hardware implementation (approximate FP8 multiplier) for L-Mul. The core computation is implemented using the dynamic reconfigurable lookup tables and carry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
