FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee

TL;DR
FlexRound introduces a novel learnable weight-rounding method for post-training quantization that uses element-wise division, enabling effective quantization across various models and tasks, including large language models, with minimal performance loss.
Contribution
This work proposes FlexRound, a new quantization scheme based on element-wise division, allowing joint learning of quantization grid size and per-weight scales, and demonstrates its effectiveness across diverse models and tasks.
Findings
Effective across image, language, and generation tasks.
Enables quantization of large language models with minimal performance impact.
First comprehensive experiments on diverse tasks using FlexRound.
Abstract
Post-training quantization (PTQ) has been gaining popularity for the deployment of deep neural networks on resource-limited devices since unlike quantization-aware training, neither a full training dataset nor end-to-end training is required at all. As PTQ schemes based on reconstructing each layer or block output turn out to be effective to enhance quantized model performance, recent works have developed algorithms to devise and learn a new weight-rounding scheme so as to better reconstruct each layer or block output. In this work, we propose a simple yet effective new weight-rounding mechanism for PTQ, coined \emph{FlexRound}, based on element-wise division instead of typical element-wise addition such that FlexRound enables jointly learning a common quantization grid size as well as a different scale for each pre-trained weight. Thanks to the reciprocal rule of derivatives induced by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
