AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation
Seonggon Kim, Alireza Khodamoradi, Pranathi Vasireddy, Kristof Denolf, Eunhyeok Park

TL;DR
AdaHOP introduces an adaptive Hadamard transform strategy that recognizes outlier patterns in low-precision training, significantly improving efficiency and accuracy in large language model training.
Contribution
It systematically identifies outlier patterns in tensors and proposes a novel adaptive transform approach that enhances low-precision training stability and speed.
Findings
Achieves up to 3.6X memory compression
Provides 1.46X end-to-end training speedup over BF16
Enables training from scratch at MXFP4 precision with BF16-level quality
Abstract
Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands' outliers, and selectively applies Outlier Extraction (OE) that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
