A Power-Efficient Hardware Implementation of L-Mul

Ruiqi Chen; Yangxintong Lyu; Han Bao; Bruno da Silva

arXiv:2412.18948·cs.AR·December 30, 2024

A Power-Efficient Hardware Implementation of L-Mul

Ruiqi Chen, Yangxintong Lyu, Han Bao, Bruno da Silva

PDF

Open Access

TL;DR

This paper introduces a power-efficient FPGA-based approximate FP8 multiplier for the L-Mul algorithm, reducing energy consumption in neural network computations while maintaining accuracy.

Contribution

It presents the first hardware implementation of L-Mul, leveraging FPGA primitives to create an energy-efficient approximate FP8 multiplier for neural networks.

Findings

01

Achieves reduced energy consumption compared to traditional multipliers.

02

Maintains acceptable accuracy in neural network inference tasks.

03

Demonstrates effective deployment in large language model inference.

Abstract

Multiplication is a core operation in modern neural network (NN) computations, contributing significantly to energy consumption. The linear-complexity multiplication (L-Mul) algorithm is specifically proposed as an approximate multiplication method for emerging NN models, such as large language model (LLM), to reduce the energy consumption and computational complexity of multiplications. However, hardware implementation designs for L-Mul have not yet been reported. Additionally, 8-bit floating-point (FP8), as an emerging data format, offers a better dynamic range compared to traditional 8-bit integer (INT8), making it increasingly popular and widely adopted in NN computations. This paper thus presents a power-efficient FPGAbased hardware implementation (approximate FP8 multiplier) for L-Mul. The core computation is implemented using the dynamic reconfigurable lookup tables and carry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques