AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference

Janghwan Lee; Jiwoong Park; Jinseok Kim; Yongjik Kim; Jungju Oh; Jinwook Oh; Jungwook Choi

arXiv:2411.09909·cs.AI·June 2, 2025

AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference

Janghwan Lee, Jiwoong Park, Jinseok Kim, Yongjik Kim, Jungju Oh, Jinwook Oh, Jungwook Choi

PDF

Open Access 1 Video

TL;DR

This paper introduces AMXFP4, an asymmetric 4-bit floating-point format that improves large language model inference accuracy by effectively managing activation outliers without calibration, outperforming existing MXFP4 and rotation-based methods.

Contribution

The paper proposes AMXFP4, a novel asymmetric floating-point format for 4-bit LLM inference that addresses activation outliers and improves accuracy without calibration overhead.

Findings

01

AMXFP4 outperforms MXFP4 by 3% on VQA.

02

AMXFP4 exceeds rotation-based methods by 1.6% on CSQA.

03

AMXFP4 surpasses recent commercial MXFP4 variants.

Abstract

As large language models (LLMs) grow in parameter size and context length, computation precision has been reduced from 16-bit to 4-bit to improve inference efficiency. However, this reduction causes accuracy degradation due to activation outliers. Rotation-based INT4 methods address this via matrix calibration, but they introduce multi-hour overheads and leave key computations in full precision. Microscaling (MX) floating-point (FP) formats offer fine-grained representation with a shared scale, enabling fully quantized matrix multiplications through direct casting without calibration. However, existing research shows unsatisfactory empirical results for MXFP4 inference, and the robustness of MX formats remains largely unexplored. In this work, we uncover the fundamental tradeoffs of the MX format: while it effectively suppresses activation outliers, it does so at the cost of increased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference· underline

Taxonomy

TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · Model Reduction and Neural Networks