AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee, Jiwoong Park, Jinseok Kim, Yongjik Kim, Jungju Oh, Jinwook Oh, Jungwook Choi

TL;DR
This paper introduces AMXFP4, an asymmetric 4-bit floating-point format that improves large language model inference accuracy by effectively managing activation outliers without calibration, outperforming existing MXFP4 and rotation-based methods.
Contribution
The paper proposes AMXFP4, a novel asymmetric floating-point format for 4-bit LLM inference that addresses activation outliers and improves accuracy without calibration overhead.
Findings
AMXFP4 outperforms MXFP4 by 3% on VQA.
AMXFP4 exceeds rotation-based methods by 1.6% on CSQA.
AMXFP4 surpasses recent commercial MXFP4 variants.
Abstract
As large language models (LLMs) grow in parameter size and context length, computation precision has been reduced from 16-bit to 4-bit to improve inference efficiency. However, this reduction causes accuracy degradation due to activation outliers. Rotation-based INT4 methods address this via matrix calibration, but they introduce multi-hour overheads and leave key computations in full precision. Microscaling (MX) floating-point (FP) formats offer fine-grained representation with a shared scale, enabling fully quantized matrix multiplications through direct casting without calibration. However, existing research shows unsatisfactory empirical results for MXFP4 inference, and the robustness of MX formats remains largely unexplored. In this work, we uncover the fundamental tradeoffs of the MX format: while it effectively suppresses activation outliers, it does so at the cost of increased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · Model Reduction and Neural Networks
