DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

Haokun Lin; Xinle Jia; Haobo Xu; Bingchen Yao; Xianglong Guo; Yichen Wu; Zhichao Lu; Ying Wei; Qingfu Zhang; Zhenan Sun

arXiv:2604.17789·cs.CV·April 22, 2026

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

Haokun Lin, Xinle Jia, Haobo Xu, Bingchen Yao, Xianglong Guo, Yichen Wu, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun

PDF

1 Repo

TL;DR

DuQuant++ introduces an outlier-aware rotation technique tailored for MXFP4 microscaling format, significantly improving quantization accuracy and efficiency for large language models.

Contribution

It adapts fine-grained rotation to MXFP4, reducing complexity and enhancing quantization performance by specifically targeting activation outliers.

Findings

01

Achieves state-of-the-art quantization performance on LLaMA-3 models.

02

Halves the online rotation cost compared to previous methods.

03

Effectively smooths weight distribution and handles outliers.

Abstract

The MXFP4 microscaling format, which partitions tensors into blocks of 32 elements sharing an E8M0 scaling factor, has emerged as a promising substrate for efficient LLM inference, backed by native hardware support on NVIDIA Blackwell Tensor Cores. However, activation outliers pose a unique challenge under this format: a single outlier inflates the shared block scale, compressing the effective dynamic range of the remaining elements and causing significant quantization error. Existing rotation-based remedies, including randomized Hadamard and learnable rotations, are data-agnostic and therefore unable to specifically target the channels where outliers concentrate. We propose DuQuant++, which adapts the outlier-aware fine-grained rotation of DuQuant to the MXFP4 format by aligning the rotation block size with the microscaling group size (B{=}32). Because each MXFP4 group possesses an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hsu1023/DuQuant-v2
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.