BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models
Junyu Chen, Jungang Li, Jing Xiong, Wenjie Wang, Qingyao Yang, He Xiao, Zhen Li, Taiqiang Wu, Mengzhao Chen, Zhen Peng, Chaofan Tao, Long Shi, Hongxia Yang, Ngai Wong

TL;DR
This paper introduces BPDQ, a novel quantization method for large language models that uses variable grids via bit-planes, significantly improving 2-bit quantization accuracy and efficiency.
Contribution
BPDQ constructs a variable quantization grid with bit-planes and iteratively refines it, expanding the feasible set and aligning with the optimization objective in Hessian geometry.
Findings
Enabled 83.85% GSM8K accuracy for Qwen2.5-72B on a single RTX 3090 at 2-bit.
Theoretically shows the variable grid expands the feasible set and aligns with the optimization in Hessian geometry.
Improves quantization fidelity over fixed grid methods at low bit regimes.
Abstract
Large language model inference is often bounded by memory footprint and bandwidth in resource-constrained deployments, making quantization fundamental to efficient serving. While post-training quantization (PTQ) maintains high fidelity at 4-bit, it deteriorates at 2-3 bits. In essence, existing methods enforce a shape-invariant quantization grid (e.g., the fixed uniform intervals of UINT2) for each group, severely restricting the feasible set for error minimization. To address this, we propose Bit-Plane Decomposition Quantization (BPDQ), which constructs a variable quantization grid via bit-planes and scalar coefficients, and iteratively refines them using second-order information while progressively compensating for quantization errors to minimize output discrepancy. In the 2-bit regime, BPDQ enables serving Qwen2.5-72B on a single RTX 3090 with 83.85\% GSM8K accuracy (vs. 90.83\% at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Natural Language Processing Techniques
