Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization
Yuli Zhou, Qingxuan Chen, Luca Benini, Guolei Sun, Yawei Li

TL;DR
This paper introduces VQRound, a parameter-efficient framework for adaptive rounding in LLM quantization, which improves efficiency and convergence by reparameterizing rounding matrices into compact codebooks and optimizing them with minimal samples.
Contribution
VQRound reparameterizes adaptive rounding matrices into compact codebooks, enabling scalable, efficient, and effective quantization for large language models.
Findings
VQRound outperforms traditional adaptive rounding in convergence speed.
Uses only 0.2% of trainable parameters for optimization.
Achieves better quantization results on multiple LLM benchmarks.
Abstract
Adaptive Rounding has emerged as an alternative to round-to-nearest (RTN) for post-training quantization by enabling cross-element error cancellation. Yet, dense and element-wise rounding matrices are prohibitively expensive for billion-parameter large language models (LLMs). We revisit adaptive rounding from an efficiency perspective and propose VQRound, a parameter-efficient optimization framework that reparameterizes the rounding matrix into a compact codebook. Unlike low-rank alternatives, VQRound minimizes the element-wise worst-case error under norm, which is critical for handling heavy-tailed weight distributions in LLMs. Beyond reparameterization, we identify rounding initialization as a decisive factor and develop a lightweight end-to-end finetuning pipeline that optimizes codebooks across all layers using only 128 samples. Extensive experiments on OPT, LLaMA,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Speech Recognition and Synthesis
