IterL2Norm: Fast Iterative L2-Normalization

ChangMin Ye; Yonguk Sim; Youngchae Kim; SeongMin Jin; Doo Seok Jeong

arXiv:2412.04778·cs.LG·January 20, 2025

IterL2Norm: Fast Iterative L2-Normalization

ChangMin Ye, Yonguk Sim, Youngchae Kim, SeongMin Jin, Doo Seok Jeong

PDF

Open Access

TL;DR

This paper introduces IterL2Norm, an efficient iterative L2-normalization method for transformer layer normalization that reduces data movement and accelerates normalization on-chip, outperforming traditional algorithms in speed and precision.

Contribution

The paper presents a novel iterative L2-normalization technique optimized for on-chip implementation, significantly improving speed and accuracy over existing methods in transformer models.

Findings

01

IterL2Norm converges within five iterations for high precision.

02

It outperforms the fast inverse square root algorithm in most tested cases.

03

Implemented in CMOS, it normalizes vectors with low latency of 116-227 cycles.

Abstract

Transformer-based large language models are a memory-bound model whose operation is based on a large amount of data that are marginally reused. Thus, the data movement between a host and accelerator likely dictates the total wall-clock time. Layer normalization is one of the key workloads in the transformer model, following each of multi-head attention and feed-forward network blocks. To reduce data movement, layer normalization needs to be performed on the same chip as the matrix-matrix multiplication engine. To this end, we introduce an iterative L2-normalization method for 1D input (IterL2Norm), ensuring fast convergence to the steady-state solution within five iteration steps and high precision, outperforming the fast inverse square root algorithm in six out of nine cases for FP32 and five out of nine for BFloat16 across the embedding lengths used in the OPT models. Implemented in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Neural Networks and Applications · Image Retrieval and Classification Techniques

MethodsLayer Normalization · Attention Is All You Need · Linear Layer · Softmax · OPT · Multi-Head Attention