RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li, Xuewen Liu, Jing Zhang, and Qingyi Gu

TL;DR
RepQuant introduces a novel post-training quantization framework for large transformer models that employs complex quantizers during calibration and simplified ones during inference, achieving high accuracy and efficiency through scale reparameterization.
Contribution
The paper proposes RepQuant, a new PTQ method using scale reparameterization to decouple quantization and inference, enabling accurate and hardware-friendly quantization of large transformers.
Findings
Significant performance improvements over existing PTQ methods.
Effective quantization of LayerNorm and Softmax activations.
Versatile application across vision, language, and multi-modal transformers.
Abstract
Large transformer models have demonstrated remarkable success. Post-training quantization (PTQ), which requires only a small dataset for calibration and avoids end-to-end retraining, is a promising solution for compressing these large models. Regrettably, existing PTQ methods typically exhibit non-trivial performance loss. We find that the performance bottleneck stems from over-consideration of hardware compatibility in the quantization process, compelling them to reluctantly employ simple quantizers, albeit at the expense of accuracy. With the above insights, we propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm to address the above issues. RepQuant employs complex quantizers in the quantization process and simplified quantizers in the inference process, and performs mathematically equivalent transformations between the two through quantization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsFocus · Softmax
