How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You, Minseop Park, Kyunggeun Lee, Seokjun An, Chirag Patel,, Markus Nage

TL;DR
This paper compares three parameterizations of asymmetric uniform quantization in quantization-aware training, analyzing their effects on training stability and proposing best practices for improved performance.
Contribution
It introduces a comprehensive analysis of three asymmetric quantization parameterizations and offers best practices to enhance quantization-aware training stability and speed.
Findings
Scale and offset parameterization performs best under certain conditions.
Minimum and maximum parameterization shows sensitivity to hyperparameters.
Beta and gamma parameterization offers a flexible alternative.
Abstract
This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particular focus is on their changing behavior in response to critical training hyperparameters, bit width and learning rate. Based on our investigation, we propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus
