Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

Ahmad Abdelfattah; Jack Dongarra; Massimiliano Fasi; Mantas Mikaitis; Fran\c{c}oise Tisseur

arXiv:2506.11277·math.NA·May 11, 2026

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

Ahmad Abdelfattah, Jack Dongarra, Massimiliano Fasi, Mantas Mikaitis, Fran\c{c}oise Tisseur

PDF

TL;DR

This paper analyzes a method to perform floating-point matrix multiplication using integer arithmetic, focusing on accuracy-performance tradeoffs and practical implementation on modern GPU hardware.

Contribution

It introduces an efficient way to estimate the minimum number of slices for desired accuracy and evaluates the method's effectiveness on NVIDIA GPUs.

Findings

01

More slices improve accuracy but increase computation.

02

The algorithm can be inaccurate with badly scaled matrices.

03

Experimental results confirm the theoretical analysis.

Abstract

Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer slices, the product of these slices is computed exactly, and AB is approximated by accumulating these integer products in floating-point arithmetic. This technique is particularly well suited to mixed-precision matrix multiply-accumulate units with integer support, such as the NVIDIA tensor cores or the AMD matrix cores. The number of slices allows for performance-accuracy tradeoffs: more slices yield better accuracy but require more multiplications, which in turn reduce performance. We propose an inexpensive way to estimate the minimum number of multiplications needed to achieve a prescribed level of accuracy. Our error analysis shows that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.