Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation
Pingzhi Tang, Ruijie Zhou, Fanxu Meng, Wenjie Pei, Muhan Zhang

TL;DR
This paper introduces LoRDS, a low-rank decomposition framework for LLM quantization and adaptation that surpasses block-wise methods in efficiency and expressiveness, enabling high-fidelity quantization and high-rank PEFT without additional inference costs.
Contribution
LoRDS rethinks quantization granularity by modeling scaling as continuous low-rank matrices, unifying quantization and adaptation with superior performance and efficiency.
Findings
Outperforms state-of-the-art baselines in quantization and fine-tuning.
Achieves up to 27% accuracy improvement at 3 bits on Llama3-8B.
Provides 1.5x inference speedup on NVIDIA RTX 4090.
Abstract
Current quantization methods for LLMs predominantly rely on block-wise structures to maintain efficiency, often at the cost of representational flexibility. In this work, we demonstrate that element-wise quantization can be made as efficient as block-wise scaling while providing strictly superior expressive power by modeling the scaling manifold as continuous low-rank matrices (). We propose Low-Rank Decomposed Scaling (LoRDS), a unified framework that rethinks quantization granularity through this low-rank decomposition. By "breaking the blocks" of spatial constraints, LoRDS establishes a seamless efficiency lifecycle: it provides high-fidelity PTQ initialization refined via iterative optimization, enables joint QAT of weights and scaling factors, and facilitates high-rank multiplicative PEFT adaptation. Unlike additive PEFT approaches such as QLoRA, LoRDS enables high-rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
