BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization
Ji-Fu Li, Manyi Zhang, Xiaobo Xia, Han Bao, Haoli Bai, Zhenhua Dong, Xianzhi Yu

TL;DR
BATQuant is a novel quantization method for MXFP4 formats that improves outlier resilience and performance in large language models by restricting transformations and optimizing distribution shaping.
Contribution
It introduces BATQuant, a block-wise affine transformation approach with Kronecker decomposition and learnable clipping, addressing format mismatch issues in MXFP4 quantization.
Findings
Achieves up to 96.43% of full-precision performance on multimodal benchmarks.
Outperforms existing quantization methods across various tasks.
Establishes new state-of-the-art results under W4A4KV16 configurations.
Abstract
Microscaling floating-point (MXFP) formats have emerged as a promising standard for deploying Multi-modal Large Language Models (MLLMs) and Large Language Models (LLMs) on modern accelerator architectures. However, existing Post-Training Quantization (PTQ) methods, particularly rotation-based techniques designed for integer formats, suffer from severe performance collapse when applied to MXFP4. Recent studies attribute this failure to a fundamental format mismatch: global orthogonal rotations inadvertently transfer outlier energy across quantization blocks, inducing new outliers that disrupt local block-wise scaling, while often creating bimodal activation distributions that underutilize the limited quantization range. To address these issues, we propose BATQuant (Block-wise Affine Transformation), which restricts transformations to align with MXFP granularity to prevent cross-block…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Natural Language Processing Techniques · Speech Recognition and Synthesis
