Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
Dawon Choi, Hana Kim, Ji-Hoon Kim

TL;DR
This paper introduces hardware-efficient Softmax and LayerNorm implementations with guaranteed normalization, optimized for edge devices, achieving significant area reduction while maintaining high accuracy.
Contribution
It proposes novel approximation methods for Softmax and LayerNorm that preserve normalization guarantees, suitable for edge NLP and generative AI applications.
Findings
Achieves high accuracy with minimal degradation on GLUE and SQuAD benchmarks.
Reduces hardware area by up to 11x for Softmax and 14x for LayerNorm.
Designs are synthesized in 28nm CMOS, demonstrating practical hardware efficiency.
Abstract
In Transformer models, non-GEMM (non-General Matrix Multiplication) operations -- especially Softmax and Layer Normalization (LayerNorm) -- often dominate hardware cost due to their nonlinear nature. To address this, previous approximation studies mainly target rank-oriented tasks, which is acceptable for classification. However, edge Natural Language Processing (NLP) applications and edge generative AI are largely evaluated based on score-oriented tasks, so normalization-guaranteed non-GEMM operations are essential. We propose a hardware-efficient Softmax and LayerNorm with Guaranteed Normalization for Edge devices. Our design employs hardware-efficient approximation methods while preserving the normalization (Softmax: , LayerNorm: ). Our architecture is described in Verilog HDL and synthesized using the Samsung 28nm CMOS process. In accuracy evaluation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
