LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation
Hongyaoxing Gu, Lijuan Hu, Liye Yu, Haowei Li, Fangfang Liu

TL;DR
LoPRo introduces a fine-tuning-free low-rank quantization method that uses block-wise permutation and Walsh-Hadamard transforms to improve accuracy at 2-3 bits, outperforming existing methods on large language models.
Contribution
The paper presents LoPRo, a novel low-rank quantization approach that enhances residual matrix quantization with permutation and transformation techniques, eliminating the need for fine-tuning.
Findings
Outperforms existing fine-tuning-free PTQ methods at 2- and 3-bit quantization.
Achieves state-of-the-art accuracy on LLaMA-2 and LLaMA-3 models.
Reduces perplexity and improves accuracy on Mixtral-8x7B with efficient quantization.
Abstract
Post-training quantization (PTQ) enables effective model compression while preserving relatively high accuracy. Current weight-only PTQ methods primarily focus on the challenging sub-3-bit regime, where approaches often suffer significant accuracy degradation, typically requiring fine-tuning to achieve competitive performance. In this work, we revisit the fundamental characteristics of weight quantization and analyze the challenges in quantizing the residual matrix under low-rank approximation. We propose LoPRo, a novel fine-tuning-free PTQ algorithm that enhances residual matrix quantization by applying block-wise permutation and Walsh-Hadamard transformations to rotate columns of similar importance, while explicitly preserving the quantization accuracy of the most salient column blocks. Furthermore, we introduce a mixed-precision fast low-rank decomposition based on rank-1 sketch…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* LoPRo significantly improves over state of the art algorithms on perplexity and downstream benchmarks. * Quantization runtimes don’t shoot up. * Does not require finetuning unlike other low-bit quantization schemes.
* If W is dense in formation, the low rank approximation may not scale, offloading the majority of the error correction to quantization. * Experiments are conducted on Llama-2, Llama-3 and Mistal. The results can be made stronger with results on frontier open source models such as Qwen2.5/Qwen3 or DeepSeek.
* The proposed method yields pretty strong performance, outperforming state-of-the-art competitive approach at 2-bit quantization by a significant margin. * Adding the low‑rank adapter increases latency by only ~10 % compared with a baseline that does not use the low‑rank component. * The paper includes detailed ablation studies on (i) the choice of rotation matrix and (ii) the LoRA rank (see the appendix).
**Quantization cost** * It is claimed that the method is almost as fast as GPTQ for the scalar quantization case and faster than GPTVQ for the vector quantization case. However, GPTQ baseline seems to be unoptimized. GPTQModel repository in fact quantizes model that the provided numbers. Specifically, in my experience quantization of 7-8B Llama model takes ~8 minutes on single L40S. LoPRo_v is claimed to be faster than GPTVQ, but the LoRPo_v is in fact and enhancement of GPTVQ and should take
Originality: I have found the contributions in Sec. 3.2 (partial rotation quantization) and 3.3 (R1SVD) sufficiently novel. Clarity: The paper is well written and easy to understand. Quality: I think the author did a good job of both technical contribution and literary presentation. Significance: PTQ for LLMs remains one of the hot topics in AI research these days. This work seems a valuable contribution to an already-mature field.
- The idea of rotation in PTQ already exists in the literature (e.g., SmoothRot). - Sec. 3.4 is a bit sloppy and too concise for a detailed analysis. - Grammatical errors need more careful proofreading. For example, lines 153-154.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
