OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
Jahyun Koo, Dahoon Park, Sangwoo Jung, Jaeha Kung

TL;DR
OPAL is a hardware-software co-designed accelerator for large language models that uses novel activation quantization with outlier preservation and mixed precision to significantly improve energy efficiency and reduce area with minimal accuracy loss.
Contribution
The paper introduces a new activation quantization method with outlier preservation and mixed precision, along with a specialized hardware architecture for efficient LLM acceleration.
Findings
Energy efficiency improved by 1.6 to 2.2 times
Area reduced by 2.4 to 3.1 times
Negligible accuracy loss (<1 perplexity increase)
Abstract
To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on quantizing activations. In this paper, we present a hardware-software co-design method that results in an energy-efficient LLM accelerator, named OPAL, for generation tasks. First of all, a novel activation quantization method that leverages the microscaling data format while preserving several outliers per sub-tensor block (e.g., four out of 128 elements) is proposed. Second, on top of preserving outliers, mixed precision is utilized that sets 5-bit for inputs to sensitive layers in the decoder block of an LLM, while keeping inputs to less sensitive layers to 3-bit. Finally, we present the OPAL hardware architecture that consists of FP units for handling outliers and vectorized INT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSoftmax
