Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation

Pingzhi Tang; Ruijie Zhou; Fanxu Meng; Wenjie Pei; Muhan Zhang

arXiv:2601.22716·cs.LG·February 2, 2026

Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation

Pingzhi Tang, Ruijie Zhou, Fanxu Meng, Wenjie Pei, Muhan Zhang

PDF

Open Access

TL;DR

This paper introduces LoRDS, a low-rank decomposition framework for LLM quantization and adaptation that surpasses block-wise methods in efficiency and expressiveness, enabling high-fidelity quantization and high-rank PEFT without additional inference costs.

Contribution

LoRDS rethinks quantization granularity by modeling scaling as continuous low-rank matrices, unifying quantization and adaptation with superior performance and efficiency.

Findings

01

Outperforms state-of-the-art baselines in quantization and fine-tuning.

02

Achieves up to 27% accuracy improvement at 3 bits on Llama3-8B.

03

Provides 1.5x inference speedup on NVIDIA RTX 4090.

Abstract

Current quantization methods for LLMs predominantly rely on block-wise structures to maintain efficiency, often at the cost of representational flexibility. In this work, we demonstrate that element-wise quantization can be made as efficient as block-wise scaling while providing strictly superior expressive power by modeling the scaling manifold as continuous low-rank matrices ( $S = B A$ ). We propose Low-Rank Decomposed Scaling (LoRDS), a unified framework that rethinks quantization granularity through this low-rank decomposition. By "breaking the blocks" of spatial constraints, LoRDS establishes a seamless efficiency lifecycle: it provides high-fidelity PTQ initialization refined via iterative optimization, enables joint QAT of weights and scaling factors, and facilitates high-rank multiplicative PEFT adaptation. Unlike additive PEFT approaches such as QLoRA, LoRDS enables high-rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques