Normalized Architectures are Natively 4-Bit

Maxim Fishman; Brian Chmiel; Ron Banner; Daniel Soudry; Boris Ginsburg

arXiv:2605.06067·cs.LG·May 8, 2026

Normalized Architectures are Natively 4-Bit

Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg

PDF

1 Repo

TL;DR

This paper introduces nGPT, a normalized architecture constrained to the unit hypersphere, which inherently improves robustness to 4-bit quantization in large language models, simplifying training and enhancing scalability.

Contribution

The authors demonstrate that hypersphere-constrained architectures like nGPT are naturally more robust to low-precision arithmetic, eliminating complex interventions needed in standard models.

Findings

01

nGPT enables stable 4-bit training without additional interventions.

02

Hypersphere constraint improves the effective signal-to-noise ratio in quantized models.

03

Robustness benefits increase with larger hidden dimensions, especially at scale.

Abstract

Training large language models at 4-bit precision is critical for efficiency. We show that nGPT, an architecture that constrains weights and hidden representations to the unit hypersphere, is inherently more robust to low-precision arithmetic. This removes the need for interventions-such as applying random Hadamard transforms and performing per-tensor scaling calculations-to preserve model quality, and it enables stable end-to-end NVFP4 training. We validate this approach on both a 1.2B dense model and hybrid (Mamba-Transformer) MoE models of up to 3B/30B parameters. We trace this robustness to the dot product: while quantization noise remains largely uncorrelated in both standard and normalized architectures, the signal behaves differently. In nGPT, the hypersphere constraint enhances weak positive correlations among the element-wise products, leading to a constructive accumulation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonymous452026/ngpt-nvfp4
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.