# Metis: Training LLMs with FP4 Quantization

**Authors:** Hengjie Cao, Mengyi Chen, Yifeng Yang, Ruijun Huang, Fang Dong, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Yuan Cheng, Fan Wu, Fan Yang, Tun Lu, Ning Gu, Li Shang

arXiv: 2509.00404 · 2025-10-01

## TL;DR

Metis introduces a spectral-domain quantization method that effectively reduces errors in low-bit training of large language models by partitioning anisotropic spectra, enabling FP4 quantization with minimal performance loss.

## Contribution

This work presents Metis, a novel spectral-domain quantization framework that preserves spectral structure and reduces errors in low-bit LLM training, outperforming existing FP4 methods.

## Key findings

- Enables robust W4A4G4 training with FP4 on LLaMA-3 8B
- Achieves only 0.4% training loss gap compared to BF16
- Surpasses Nvidia's FP4 recipe in accuracy and efficiency

## Abstract

This work identifies anisotropy in the singular value spectra of parameters, activations, and gradients as the fundamental barrier to low-bit training of large language models (LLMs). These spectra are dominated by a small fraction of large singular values, inducing wide numerical ranges that cause quantization bias and severe spectral distortion, ultimately degrading training performance. This work presents Metis, a spectral-domain quantization framework that partitions anisotropic spectra into narrower sub-distributions for independent quantization, thereby reducing errors and preserving spectral structure. To minimize overhead, Metis leverages two key properties of the dominant spectral subspace: preservation via sparsely random sampling and preservation via random projection, reducing decomposition cost to a negligible level. On LLaMA-3 8B trained with 100B tokens, Metis enables robust W4A4G4 training with FP4 quantization of weights, activations, and gradients, yielding only a 0.4% training loss gap and a 0.1% degradation in downstream accuracy relative to BF16. Beyond matching BF16 fidelity, Metis also surpasses our implementation of Nvidia's recently announced (yet to be publicly released) FP4 recipe, consistently achieving lower loss and higher downstream accuracy while incurring significantly lower computational overhead. The code implementation for Metis is available at: https://anonymous.4open.science/r/Metis-quantization-644B.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00404/full.md

## Figures

33 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00404/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/2509.00404/full.md

---
Source: https://tomesphere.com/paper/2509.00404