SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization

Chengzhu Bao; Xianglong Yan; Zhiteng Li; Guangshuo Qin; Guanghua Yu; Yulun Zhang

arXiv:2605.12245·cs.LG·May 13, 2026

SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization

Chengzhu Bao, Xianglong Yan, Zhiteng Li, Guangshuo Qin, Guanghua Yu, Yulun Zhang

PDF

1 Repo

TL;DR

SOAR is a post-training quantization framework that optimizes scales in NVFP4 format, significantly improving LLM accuracy without hardware changes.

Contribution

It introduces joint and decoupled scale optimization techniques, enhancing NVFP4 quantization performance over existing methods.

Findings

01

Outperforms existing NVFP4 quantization baselines

02

Achieves higher accuracy at the same memory footprint

03

No additional hardware overhead required

Abstract

NVFP4 has recently emerged as an efficient 4-bit microscaling format for large language models (LLMs), offering superior numerical fidelity with native hardware support. However, existing methods often yield suboptimal performance due to inflexible scale selection and the coupled treatment of quantization and dequantization scales. To address these issues, we propose Scale Optimization for Accurate Reconstruction (SOAR), a novel post-training quantization framework that improves the accuracy of NVFP4 quantization. At its core, SOAR features Closed-form Joint Scale Optimization (CJSO), which jointly optimizes global and block-wise scales via analytical solutions derived from reconstruction error minimization. Furthermore, it incorporates Decoupled Scale Search (DSS). DSS decouples the high-precision quantization scale from its constrained dequantization counterpart, and performs discrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

steven-bao1/SOAR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.