RUQuant: Towards Refining Uniform Quantization for Large Language Models

Han Liu; Haotian Gao; Changya Li; Feng Zhang; Xiaotong Zhang; Wei Wang; Hong Yu

arXiv:2604.04013·cs.CL·April 7, 2026

RUQuant: Towards Refining Uniform Quantization for Large Language Models

Han Liu, Haotian Gao, Changya Li, Feng Zhang, Xiaotong Zhang, Wei Wang, Hong Yu

PDF

TL;DR

RUQuant introduces a theoretically grounded, two-stage orthogonal transformation method for uniform activation quantization in large language models, significantly reducing accuracy loss without retraining.

Contribution

It proposes a novel orthogonal transformation approach based on Lloyd-Max optimality to improve uniform quantization of activations in LLMs.

Findings

01

Achieves 99.8% of full-precision accuracy with W6A6 quantization.

02

Attains 97% accuracy with W4A4 quantization for a 13B LLM.

03

Operates within approximately one minute without model fine-tuning.

Abstract

The increasing size and complexity of large language models (LLMs) have raised significant challenges in deployment efficiency, particularly under resource constraints. Post-training quantization (PTQ) has emerged as a practical solution by compressing models without requiring retraining. While existing methods focus on uniform quantization schemes for both weights and activations, they often suffer from substantial accuracy degradation due to the non-uniform nature of activation distributions. In this work, we revisit the activation quantization problem from a theoretical perspective grounded in the Lloyd-Max optimality conditions. We identify the core issue as the non-uniform distribution of activations within the quantization interval, which causes the optimal quantization point under the Lloyd-Max criterion to shift away from the midpoint of the interval. To address this issue, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.