How to Parameterize Asymmetric Quantization Ranges for   Quantization-Aware Training

Jaeseong You; Minseop Park; Kyunggeun Lee; Seokjun An; Chirag Patel,; Markus Nage

arXiv:2404.16898·cs.LG·April 29, 2024

How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training

Jaeseong You, Minseop Park, Kyunggeun Lee, Seokjun An, Chirag Patel,, Markus Nage

PDF

Open Access

TL;DR

This paper compares three parameterizations of asymmetric uniform quantization in quantization-aware training, analyzing their effects on training stability and proposing best practices for improved performance.

Contribution

It introduces a comprehensive analysis of three asymmetric quantization parameterizations and offers best practices to enhance quantization-aware training stability and speed.

Findings

01

Scale and offset parameterization performs best under certain conditions.

02

Minimum and maximum parameterization shows sensitivity to hyperparameters.

03

Beta and gamma parameterization offers a flexible alternative.

Abstract

This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particular focus is on their changing behavior in response to critical training hyperparameters, bit width and learning rate. Based on our investigation, we propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsFocus