Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization

Yuli Zhou; Qingxuan Chen; Luca Benini; Guolei Sun; Yawei Li

arXiv:2602.02151·cs.LG·February 3, 2026

Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization

Yuli Zhou, Qingxuan Chen, Luca Benini, Guolei Sun, Yawei Li

PDF

Open Access

TL;DR

This paper introduces VQRound, a parameter-efficient framework for adaptive rounding in LLM quantization, which improves efficiency and convergence by reparameterizing rounding matrices into compact codebooks and optimizing them with minimal samples.

Contribution

VQRound reparameterizes adaptive rounding matrices into compact codebooks, enabling scalable, efficient, and effective quantization for large language models.

Findings

01

VQRound outperforms traditional adaptive rounding in convergence speed.

02

Uses only 0.2% of trainable parameters for optimization.

03

Achieves better quantization results on multiple LLM benchmarks.

Abstract

Adaptive Rounding has emerged as an alternative to round-to-nearest (RTN) for post-training quantization by enabling cross-element error cancellation. Yet, dense and element-wise rounding matrices are prohibitively expensive for billion-parameter large language models (LLMs). We revisit adaptive rounding from an efficiency perspective and propose VQRound, a parameter-efficient optimization framework that reparameterizes the rounding matrix into a compact codebook. Unlike low-rank alternatives, VQRound minimizes the element-wise worst-case error under $L_{\infty}$ norm, which is critical for handling heavy-tailed weight distributions in LLMs. Beyond reparameterization, we identify rounding initialization as a decisive factor and develop a lightweight end-to-end finetuning pipeline that optimizes codebooks across all layers using only 128 samples. Extensive experiments on OPT, LLaMA,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Speech Recognition and Synthesis