Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Manyi Zhang; Ji-Fu Li; Zhongao Sun; Haoli Bai; Hui-Ling Zhen; Zhenhua Dong; Xianzhi Yu

arXiv:2601.09555·cs.CL·January 15, 2026

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Manyi Zhang, Ji-Fu Li, Zhongao Sun, Haoli Bai, Hui-Ling Zhen, Zhenhua Dong, Xianzhi Yu

PDF

Open Access

TL;DR

This paper systematically evaluates post-training quantization of large language models using Microscaling Floating-Point formats, revealing the strengths and limitations of various algorithms and providing practical insights for effective MXFP-based quantization.

Contribution

It offers the first comprehensive analysis of PTQ algorithms under MXFP formats across multiple models and benchmarks, highlighting format-specific challenges and optimization strategies.

Findings

01

MXFP8 achieves near-lossless performance

02

MXFP4 causes significant accuracy loss

03

Quantization sensitivity is mainly influenced by language models

Abstract

Microscaling Floating-Point (MXFP) has emerged as a promising low-precision format for large language models (LLMs). Despite various post-training quantization (PTQ) algorithms being proposed, they mostly focus on integer quantization, while their applicability and behavior under MXFP formats remain largely unexplored. To address this gap, this work conducts a systematic investigation of PTQ under MXFP formats, encompassing over 7 PTQ algorithms, 15 evaluation benchmarks, and 3 LLM families. The key findings include: 1) MXFP8 consistently achieves near-lossless performance, while MXFP4 introduces substantial accuracy degradation and remains challenging; 2) PTQ effectiveness under MXFP depends strongly on format compatibility, with some algorithmic paradigms being consistently more effective than others; 3) PTQ performance exhibits highly consistent trends across model families and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Advanced Neural Network Applications