Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Roberto L. Castro; Andrei Panferov; Soroush Tabesh; Oliver Sieberling; Jiale Chen; Mahdi Nikdan; Saleh Ashkboos; Dan Alistarh

arXiv:2505.14669·cs.LG·January 16, 2026

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Roberto L. Castro, Andrei Panferov, Soroush Tabesh, Oliver Sieberling, Jiale Chen, Mahdi Nikdan, Saleh Ashkboos, Dan Alistarh

PDF

Open Access 1 Repo

TL;DR

This paper introduces Quartet, a novel method for training large language models entirely in FP4 precision, leveraging hardware support to improve efficiency without sacrificing accuracy, and demonstrating competitive performance against traditional precisions.

Contribution

The paper presents a new hardware-supported FP4 training approach and the Quartet method, enabling accurate, end-to-end low-precision training of LLMs with optimized CUDA kernels.

Findings

01

FP4 training can match FP16 and FP8 accuracy levels.

02

A new low-precision scaling law quantifies performance trade-offs.

03

Quartet achieves competitive training efficiency on Blackwell architecture.

Abstract

Training large language models (LLMs) models directly in low-precision offers a way to address computational costs by improving both throughput and energy efficiency. For those purposes, NVIDIA's recent Blackwell architecture facilitates very low-precision operations using FP4 variants. Yet, current algorithms for training LLMs in FP4 precision face significant accuracy degradation and often rely on mixed-precision fallbacks. In this paper, we investigate hardware-supported FP4 training and introduce a new approach for accurate, end-to-end FP4 training with all the major computations (i.e., linear layers) in low precision. Through extensive evaluations on Llama-type models, we reveal a new low-precision scaling law that quantifies performance trade-offs across bit-widths and training setups. Guided by this investigation, we design an "optimal" technique in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ist-daslab/quartet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications