On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
Tadisetty Sai Yashwanth

TL;DR
This paper investigates the structure of floating-point errors in GPU matrix multiplication, revealing that errors are highly correlated and structured rather than independent Gaussian noise, which impacts deep learning reliability analysis.
Contribution
It empirically demonstrates that floating-point errors in GPU matmul are correlated and structured, challenging the common i.i.d. Gaussian noise assumption.
Findings
Floating-point errors are highly correlated in GPU matmul.
The error covariance is significantly structured, especially in float16.
The i.i.d. Gaussian noise model fails to predict actual error behavior.
Abstract
Floating-point non-associativity makes fundamental deep learning operations, such as matrix multiplication (matmul) on GPUs, inherently non-deterministic. Despite this, the statistical structure of the resulting numerical error remains poorly understood. A common working assumption is that these errors behave as independent and identically distributed (i.i.d.) Gaussian noise. In this paper, we empirically test this assumption and show that it fails to describe real GPU behavior. By comparing outputs of single-input and batched matmuls, we find that while the i.i.d. model predicts non-zero output instability, empirical results show a 0.00% prediction flip rate. Through covariance analysis, we uncover the cause: the floating-point error is structured and highly correlated. For float16, nearly 50% of the total error variance lies in off-diagonal terms, revealing that the noise behaves as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Stochastic Gradient Optimization Techniques
