Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A
Aaron Jarmusch, Connor Vitz, Sunita Chandrasekaran

TL;DR
This paper provides a detailed analysis of AMD MI300A's FP8 matrix cores, asynchronous execution, and structured sparsity, offering insights into their performance and system-level implications for HPC workloads.
Contribution
It offers the first execution-centric microbenchmark-based characterization of FP8 cores, ACE, and sparsity on MI300A, guiding optimization and scheduling strategies.
Findings
FP8 matrix cores achieve high occupancy thresholds.
Asynchronous compute engines improve concurrency and throughput.
Structured sparsity benefits are context-dependent and enhance performance.
Abstract
The AMD MI300A APU integrates CDNA3 GPUs with high-bandwidth memory and advanced accelerator features: FP8 matrix cores, asynchronous compute engines (ACE), and 2:4 structured sparsity. These capabilities are increasingly relied upon by modern HPC and HPC-AI workloads, yet their execution characteristics and system-level implications remain insufficiently understood. In this paper, we present an execution-centric characterization of FP8 matrix execution, ACE concurrency, and structured sparsity on MI300A using targeted microbenchmarks. We quantify occupancy thresholds, fairness, throughput trade-offs under concurrent execution, and context-dependent sparsity benefits. We evaluate representative case studies - transformer-style, concurrent, and mixed-precision kernels - to show how these effects translate into application-level performance and predictability. Our results provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Real-Time Systems Scheduling
