Bit-Accurate Modeling of GPU Matrix Multiply-Accumulate Units: Demystifying Numerical Discrepancy and Accuracy

Peichen Xie; Shuotao Xu; Yang Wang; Fan Yang; Mao Yang

arXiv:2511.10909·cs.AR·April 17, 2026

Bit-Accurate Modeling of GPU Matrix Multiply-Accumulate Units: Demystifying Numerical Discrepancy and Accuracy

Peichen Xie, Shuotao Xu, Yang Wang, Fan Yang, Mao Yang

PDF

1 Repo

TL;DR

This paper develops bit-accurate models of GPU matrix multiply-accumulate units to explain numerical discrepancies and accuracy issues across architectures, aiding diagnosis and guiding future design.

Contribution

It introduces a systematic framework for constructing complete arithmetic models of MMAUs, providing the first bit-accurate analysis across multiple GPU architectures.

Findings

01

Models explain cross-platform numerical discrepancies.

02

Identifies four precision bottlenecks affecting accuracy.

03

Provides software workarounds and design guidance.

Abstract

Modern AI accelerators rely on matrix multiply-accumulate units (MMAUs), such as NVIDIA Tensor Cores and AMD Matrix Cores, to accelerate deep neural network workloads. MMAUs expose only instruction-level or API-level interfaces of matrix multiply-accumulate (MMA) operations, while leaving internal floating-point arithmetic behaviors undocumented. Consequently, MMAUs across vendors and architectural generations often produce numerical discrepancies for identical inputs, and sometimes exhibit reduced numerical accuracy that can cause training instability. Diagnosing and understanding the root causes of these effects is challenging without white-box models of their arithmetic behaviors. This paper proposes closed-loop feature probing (CLFP), a generic and systematic framework for constructing complete arithmetic behavior models of MMA operations. Based on this framework, we analyze all MMA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/MMA-Sim
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.