Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

Aaron Jarmusch; Sunita Chandrasekaran

arXiv:2605.04178·cs.DC·May 7, 2026

Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

Aaron Jarmusch, Sunita Chandrasekaran

PDF

TL;DR

This paper presents microbenchmark-based analytical performance models for modern GPUs, achieving high accuracy across different architectures and validating with real benchmarks, with open-source release planned.

Contribution

It introduces detailed performance models for NVIDIA Blackwell and AMD CDNA3 GPUs based on systematic microbenchmark characterization, improving accuracy over naive models.

Findings

01

Models achieve 1.31% and 0.09% MAE on Blackwell and MI300A.

02

Naive roofline baselines exceed 95% error on the same kernels.

03

Models are adaptable to other GPU architectures with minimal restructuring.

Abstract

Rapidly evolving GPU architectures featuring complex memory hierarchies, matrix units, and varied precision formats continue to widen the gap between theoretical peaks and achievable performance. We design and develop analytical performance models for NVIDIA Blackwell (B200) and AMD CDNA3 (MI300A) grounded in systematic microbenchmark characterization. For Blackwell, the model captures Tensor Memory (TMEM), asynchronous bulk copy (TMA), and 5th-generation tensor cores; for CDNA3, the model captures Infinity Cache hierarchy, VGPR constraints, and occupancy. Validation yields 1.31% MAE on B200 (21 kernels) and 0.09% on MI300A (27 kernels), while naive roofline baselines exceed 95% error on the same kernels. We further validate the models using Rodinia~3.1 and SPEChpc 2021 Tiny.The models are updated with HBM bandwidth, capacity, and cache parameters and applied to H200 (Hopper) and MI250X…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.