Post Training Quantization of Large Language Models with Microscaling Formats
Sayeh Sharify, Utkarsh Saxena, Zifei Xu, Wanzin Yazar, Ilya, Soloveychik, Xin Wang

TL;DR
This paper investigates post-training quantization methods for large language models, extending their applicability to microscaling formats, and demonstrates effective 4-bit weight and 8-bit activation quantization with minimal accuracy loss.
Contribution
It introduces the extension of PTQ methods to microscaling formats and systematically analyzes their interactions for improved LLM quantization.
Findings
Quantization to 4-bit weights and 8-bit activations is achievable with negligible accuracy loss.
Combining PTQ methods enhances the quantization process for large language models.
Extending PTQ to microscaling formats broadens the applicability of quantization techniques.
Abstract
Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of three well-known post-training techniques, SmoothQuant, AWQ, and GPTQ, and provide a comprehensive analysis of their interactions and implications for advancing LLM quantization. We enhance the versatility of these methods by enabling quantization to microscaling (MX) formats, extending the applicability of these PTQ algorithms beyond their original fixed-point format targets. We show that combining different PTQ methods enables us to quantize models to 4-bit weights and 8-bit activations using the MXINT format with negligible accuracy loss compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsOPT · LLaMA
