Can Tensor Cores Benefit Memory-Bound Kernels? (No!)
Lingqi Zhang, Jiajun Huang, Sheng Di, Satoshi Matsuoka, Mohamed Wahib

TL;DR
Tensor cores, despite their success in compute-bound tasks, do not significantly improve performance on memory-bound kernels, with theoretical and empirical analyses showing limited speedup over CUDA cores.
Contribution
This paper provides a theoretical and empirical analysis demonstrating that tensor cores offer minimal performance benefits for memory-bound kernels, challenging prior optimistic claims.
Findings
Maximum 1.33x speedup of tensor cores over CUDA cores for memory-bound kernels.
Empirical validation on STREAM Scale, SpMV, and stencil kernels shows limited performance gains.
Optimizing memory-bound kernels with tensor cores does not outperform CUDA cores.
Abstract
Tensor cores are specialized processing units within GPUs that have demonstrated significant efficiency gains in compute-bound applications such as Deep Learning Training by accelerating dense matrix operations. Given their success, researchers have attempted to extend tensor core capabilities beyond dense matrix computations to other computational patterns, including memory-bound kernels. Recent studies have reported that tensor cores can outperform traditional CUDA cores even on memory-bound kernels, where the primary performance bottleneck is not computation. In this research, we challenge these findings through both theoretical and empirical analysis. Our theoretical analysis reveals that tensor cores can achieve a maximum speedup of only 1.33x over CUDA cores for memory-bound kernels in double precision (for V100, A100, and H100 GPUs). We validate this theoretical limit through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Tensor decomposition and applications
