DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Yuantian Shao; Yuanteng Chen; Peisong Wang; Jianlin Yu; Jing Lin; Yiwu Yao; Zhihui Wei; Jian Cheng

arXiv:2511.04063·cs.LG·November 7, 2025

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Yuantian Shao, Yuanteng Chen, Peisong Wang, Jianlin Yu, Jing Lin, Yiwu Yao, Zhihui Wei, Jian Cheng

PDF

Open Access 1 Video

TL;DR

DartQuant introduces an efficient rotational calibration method for large language model quantization, significantly reducing computational costs and enabling resource-constrained environments to perform high-quality model compression.

Contribution

It proposes a distribution-aware rotational calibration technique and QR-Orth optimization, reducing complexity and resource requirements for large model quantization.

Findings

01

Achieves 47× acceleration in rotational optimization.

02

Saves 10× memory compared to existing methods.

03

Enables quantization of 70B models on a single GPU.

Abstract

Quantization plays a crucial role in accelerating the inference of large-scale models, and rotational matrices have been shown to effectively improve quantization performance by smoothing outliers. However, end-to-end fine-tuning of rotational optimization algorithms incurs high computational costs and is prone to overfitting. To address this challenge, we propose an efficient distribution-aware rotational calibration method, DartQuant, which reduces the complexity of rotational optimization by constraining the distribution of the activations after rotation. This approach also effectively reduces reliance on task-specific losses, thereby mitigating the risk of overfitting. Additionally, we introduce the QR-Orth optimization scheme, which replaces expensive alternating optimization with a more efficient solution. In a variety of model quantization experiments, DartQuant demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Natural Language Processing Techniques