SLaNC: Static LayerNorm Calibration
Mahsa Salmani, Nikita Trukhanov, Ilya Soloveychik

TL;DR
This paper introduces SLaNC, a static LayerNorm calibration method for Transformer models that improves quantized inference by ensuring numerical stability without adding computational overhead.
Contribution
It proposes a simple, offline scaling technique for LayerNorm inputs based on linear layer weights, enhancing quantized Transformer inference stability.
Findings
Ensures no overflow or underflow during LayerNorm computation.
Provides theoretical justification and numerical validation.
Enables resource-efficient, accurate inference across hardware architectures.
Abstract
The ever increasing sizes of Large Language Models (LLMs) beyond hundreds of billions of parameters have generated enormous pressure on the manufacturers of dedicated hardware accelerators and made the innovative design of the latter one of the most rapidly expanding fields of the AI industry. Various approaches have been explored to enable efficient and accurate processing of LLMs on the available accelerators given their computational and storage limitations. Among these, various quantization techniques have become the main focus of the community as a means of reducing the compute, communication and storage requirements. Quantization to lower precision formats naturally poses a number of challenges caused by the limited range of the available value representations. When it comes to processing the popular Transformer models on hardware, one of the main issues becomes calculation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Radiation Detection and Scintillator Technologies
MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer
