MXNorm: Reusing MXFP block scales for efficient tensor normalisation

Callum McLean; Luke Y. Prince; Alexandre Payot; Paul Balan\c{c}a; and Carlo Luschi

arXiv:2603.13180·cs.LG·March 16, 2026

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

Callum McLean, Luke Y. Prince, Alexandre Payot, Paul Balan\c{c}a, and Carlo Luschi

PDF

Open Access

TL;DR

MXNorm is a new normalization method that leverages MXFP block scales to significantly reduce computation size, enabling faster training of large language models with minimal accuracy loss.

Contribution

The paper introduces MXNorm, a drop-in replacement for RMSNorm that uses MXFP block scales for efficient tensor normalization in deep learning models.

Findings

01

Achieves up to 2.4x kernel speedup with minimal accuracy loss.

02

Enables 32x reduction in normalization reduction size.

03

Provides practical speedups in large language model training.

Abstract

Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelerators that use increasingly low-precision number formats. However, improvements in matrix multiplication performance have far outstripped improvements in performance on reductions and elementwise computations, which are still being performed in higher precision. In this work, we propose MXNorm, a drop-in replacement for RMSNorm that estimates the RMS using only the block scales calculated as part of the MXFP8 cast and enables a 32x decrease in the size of reduction needed for normalization. We validate our approximation method on pre-training of Llama 3 models of 125M, 1B and 8B parameters, finding minimal loss of training accuracy compared to a baseline using RMSNorm with MXFP8 matmuls. We also show practical kernel speedups using only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Residue Arithmetic · Numerical Methods and Algorithms · Tensor decomposition and applications