decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points
Yi Guo, Fanliu Kong, Xiaoyang Li, Hui Li, Wei Chen, Xiaogang Tian,, Jinping Cai, Yang Zhang, Shouda Liu

TL;DR
decoupleQ introduces a novel 2-bit post-training quantization method that decouples model parameters into integer and floating-point parts, significantly improving accuracy and hardware efficiency for large models.
Contribution
It proposes a new quantization approach that transforms the problem into a constrained optimization, enabling more accurate low-bit quantization without extra computational overhead.
Findings
Achieves near fp16/bf16 accuracy on 2-bit quantized speech models.
Outperforms existing low-bit quantization methods in accuracy.
Maintains hardware-friendly linear and uniform quantization.
Abstract
Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degradation at very low bits, or require some additional computational overhead when deployed, making it difficult to be applied to large-scale applications in industry. In this paper, we propose decoupleQ, achieving a substantial increase in model accuracy, especially at very low bits. decoupleQ abandons the traditional heuristic quantization paradigm and decouples the model parameters into integer and floating-point parts, thus transforming the quantization problem into a traditional mathematical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods · Sparse and Compressive Sensing Techniques
