Metrics and Design of an Instruction Roofline Model for AMD GPUs

Matthew Leinhauser; Ren\'e Widera; Sergei Bastrakov; Alexander Debus,; Michael Bussmann; Sunita Chandrasekaran

arXiv:2110.08221·cs.DC·November 11, 2021

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Matthew Leinhauser, Ren\'e Widera, Sergei Bastrakov, Alexander Debus,, Michael Bussmann, Sunita Chandrasekaran

PDF

Open Access

TL;DR

This paper develops an instruction roofline model for AMD GPUs to evaluate application performance, addressing profiling tool limitations and comparing AMD and NVIDIA GPU architectures using a scientific application.

Contribution

It introduces an instruction roofline model for AMD GPUs utilizing AMD's ROCProfiler and BabelStream, enabling performance measurement and comparison with NVIDIA GPUs.

Findings

01

AMD MI100 achieves similar or better execution time than NVIDIA V100.

02

Profiling tool differences complicate direct performance comparisons.

03

AMD MI60 shows the worst performance among the tested GPUs.

Abstract

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD architectures (CPU-GPU), which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this paper, we design an instruction roofline model for AMD GPUs using AMD's ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application's performance in instructions and memory transactions on new AMD hardware. Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell (PIC) simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems