Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices

Dawon Choi; Hana Kim; Ji-Hoon Kim

arXiv:2604.23647·cs.AR·April 28, 2026

Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices

Dawon Choi, Hana Kim, Ji-Hoon Kim

PDF

TL;DR

This paper introduces hardware-efficient Softmax and LayerNorm implementations with guaranteed normalization, optimized for edge devices, achieving significant area reduction while maintaining high accuracy.

Contribution

It proposes novel approximation methods for Softmax and LayerNorm that preserve normalization guarantees, suitable for edge NLP and generative AI applications.

Findings

01

Achieves high accuracy with minimal degradation on GLUE and SQuAD benchmarks.

02

Reduces hardware area by up to 11x for Softmax and 14x for LayerNorm.

03

Designs are synthesized in 28nm CMOS, demonstrating practical hardware efficiency.

Abstract

In Transformer models, non-GEMM (non-General Matrix Multiplication) operations -- especially Softmax and Layer Normalization (LayerNorm) -- often dominate hardware cost due to their nonlinear nature. To address this, previous approximation studies mainly target rank-oriented tasks, which is acceptable for classification. However, edge Natural Language Processing (NLP) applications and edge generative AI are largely evaluated based on score-oriented tasks, so normalization-guaranteed non-GEMM operations are essential. We propose a hardware-efficient Softmax and LayerNorm with Guaranteed Normalization for Edge devices. Our design employs hardware-efficient approximation methods while preserving the normalization (Softmax: $\sum p = 1$ , LayerNorm: $σ = 1$ ). Our architecture is described in Verilog HDL and synthesized using the Samsung 28nm CMOS process. In accuracy evaluation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.