Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4
Musa Cim, Burak Topcu, Mahmut Taylan Kandemir

TL;DR
This paper systematically analyzes the quantization sensitivity of FP4 formats in large language models, revealing layer-specific sensitivities and differences between formats across multiple model scales.
Contribution
It provides a detailed layer-wise and block-wise sensitivity analysis of MXFP4 and NVFP4 formats in LLMs, highlighting which components are most affected by quantization.
Findings
MLP layers are most sensitive to FP4 quantization.
Gate and attention layers are less sensitive than MLP layers.
Early blocks can be highly sensitive, especially with MXFP4.
Abstract
Quantization addresses the high resource demand for large language models (LLMs) by alleviating memory pressure and bandwidth congestion and providing significantly scaled compute power with a tolerable impact on accuracy. Four-bit floating point (FP4), the lowest-precision format that preserves essential numerical properties such as exponent and sign, has begun to be adopted in cutting-edge architectures, including Blackwell and AMD CDNA, to support LLM quantization and reduce deployment costs. Although aggressive quantization can yield efficiency gains, the quantization sensitivity of within-transformer layers and whether these sensitivities generalize across existing FP4 formats and model scales remain underexplored. To elucidate quantization sensitivity, this study conducts a systematic analysis of two FP4 formats, MXFP4 and NVFP4, across three Qwen2.5 model scales (0.5B, 7B, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Embedded Systems Design Techniques
