SLaNC: Static LayerNorm Calibration

Mahsa Salmani; Nikita Trukhanov; Ilya Soloveychik

arXiv:2410.10553·cs.LG·October 15, 2024

SLaNC: Static LayerNorm Calibration

Mahsa Salmani, Nikita Trukhanov, Ilya Soloveychik

PDF

Open Access

TL;DR

This paper introduces SLaNC, a static LayerNorm calibration method for Transformer models that improves quantized inference by ensuring numerical stability without adding computational overhead.

Contribution

It proposes a simple, offline scaling technique for LayerNorm inputs based on linear layer weights, enhancing quantized Transformer inference stability.

Findings

01

Ensures no overflow or underflow during LayerNorm computation.

02

Provides theoretical justification and numerical validation.

03

Enables resource-efficient, accurate inference across hardware architectures.

Abstract

The ever increasing sizes of Large Language Models (LLMs) beyond hundreds of billions of parameters have generated enormous pressure on the manufacturers of dedicated hardware accelerators and made the innovative design of the latter one of the most rapidly expanding fields of the AI industry. Various approaches have been explored to enable efficient and accurate processing of LLMs on the available accelerators given their computational and storage limitations. Among these, various quantization techniques have become the main focus of the community as a means of reducing the compute, communication and storage requirements. Quantization to lower precision formats naturally poses a number of challenges caused by the limited range of the available value representations. When it comes to processing the popular Transformer models on hardware, one of the main issues becomes calculation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Radiation Detection and Scintillator Technologies

MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer