Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Andrei Panferov; Erik Schultheis; Soroush Tabesh; Dan Alistarh

arXiv:2601.22813·cs.LG·February 2, 2026

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Andrei Panferov, Erik Schultheis, Soroush Tabesh, Dan Alistarh

PDF

Open Access 1 Models

TL;DR

Quartet II introduces a novel unbiased quantization method, MS-EDEN, enabling more accurate and efficient fully-quantized LLM pre-training in NVFP4 format on NVIDIA Blackwell GPUs.

Contribution

It presents MS-EDEN, a new unbiased quantization routine, and integrates it into Quartet II for improved LLM pre-training in NVFP4, achieving better gradient estimation and speed.

Findings

01

MS-EDEN reduces quantization error by over 2x compared to stochastic rounding.

02

Quartet II achieves up to 4.2x speedup over BF16 on NVIDIA Blackwell GPUs.

03

Validated on LLMs with up to 1.9B parameters, maintaining high accuracy.

Abstract

The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format in favor of more accurate unbiased quantized gradient estimation by stochastic rounding (SR), losing noticeable accuracy relative to standard FP16 and FP8 training. In this paper, improve the state of the art for quantized training in NVFP4 via a novel unbiased quantization routine for micro-scaled formats, called MS-EDEN, that has more than 2x lower quantization error than SR. We integrate it into a novel fully-NVFP4 quantization scheme for linear layers, called Quartet II. We show analytically that Quartet II achieves consistently better gradient estimation across all major matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
daslab-testing/CloverLM
model· 1.4k dl· ♡ 1
1.4k dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Generative Adversarial Networks and Image Synthesis