Finer is Better (with the Right Scaling)

Clemens Schaefer; Gil Tabak

arXiv:2605.08565·cs.LG·May 12, 2026

Finer is Better (with the Right Scaling)

Clemens Schaefer, Gil Tabak

PDF

TL;DR

This paper investigates the paradox where finer quantization block sizes degrade LLM quality, revealing that proper algorithmic interventions and scaling techniques can improve quantization performance.

Contribution

The study identifies the cause of the block size paradox and proposes algorithmic solutions that enable standard quantization formats to outperform or match custom formats.

Findings

01

Proper scaling prevents underflow and reduces localized errors.

02

Algorithmic interventions like 4-over-6 improve quantization geometry.

03

Finer block sizes with the right methods strictly reduce mean squared error.

Abstract

Microscaling is a critical technique for preserving the quality of Large Language Models (LLMs) quantized to ultra-low precision formats. Intuitively, finer block sizes should yield lower quantization error; however, a paradox recently identified in the literature demonstrates that standard abs-max scaling can actually degrade model quality as block sizes shrink. In this work, we investigate the underlying mechanics of this phenomenon. We demonstrate that this degradation is not an inherent limitation of finer granularity, but is primarily driven by heavy-tailed tensor distributions interacting poorly with the coarse upper quantization bins of the FP4 element format. Specifically, we show that i) preventing the scaling factor from underflowing to zero mitigates localized errors, ii) targeted algorithmic interventions like the 4-over-6 methodology effectively correct the quantization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.