Adaptive Block-Scaled Data Types

Jack Cook; Hyemin S. Lee; Kathryn Le; Junxian Guo; Giovanni Traverso; Anantha P. Chandrakasan; Song Han

arXiv:2603.28765·cs.CL·March 31, 2026

Adaptive Block-Scaled Data Types

Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han

PDF

1 Repo

TL;DR

This paper introduces Adaptive Block-Scaled Data Types, especially IF4, which dynamically choose between FP4 and INT4 for better 4-bit quantization of language models, improving accuracy and efficiency.

Contribution

It proposes the IF4 data type that adapts to input distributions, outperforming existing formats in quantization accuracy and hardware efficiency.

Findings

01

IF4 achieves lower quantization loss during training.

02

IF4 yields higher accuracy in post-training quantization.

03

Efficient IF4 MAC unit demonstrated for hardware implementation.

Abstract

NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distribution, resulting in large amounts of quantization error on near-maximal values in each group of 16 values. In this work, we leverage this insight to design new Adaptive Block-Scaled Data Types that can adapt to the distribution of their input values. For four-bit quantization, our proposed IF4 (Int/Float 4) data type selects between FP4 and INT4 representations for each group of 16 values, which are then scaled by an E4M3 scale factor as is done with NVFP4. The selected data type is denoted using the scale factor's sign bit, which is currently unused in NVFP4, and we apply the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mit-han-lab/fouroversix
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.