AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

Seonggon Kim; Alireza Khodamoradi; Pranathi Vasireddy; Kristof Denolf; Eunhyeok Park

arXiv:2604.02525·cs.LG·May 11, 2026

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

Seonggon Kim, Alireza Khodamoradi, Pranathi Vasireddy, Kristof Denolf, Eunhyeok Park

PDF

TL;DR

AdaHOP introduces an adaptive Hadamard transform strategy that recognizes outlier patterns in low-precision training, significantly improving efficiency and accuracy in large language model training.

Contribution

It systematically identifies outlier patterns in tensors and proposes a novel adaptive transform approach that enhances low-precision training stability and speed.

Findings

01

Achieves up to 3.6X memory compression

02

Provides 1.46X end-to-end training speedup over BF16

03

Enables training from scratch at MXFP4 precision with BF16-level quality

Abstract

Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands' outliers, and selectively applies Outlier Extraction (OE) that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.