Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Johannes Pekkil\"a; Oskar Lappi; Fredrik Roberts\'en; Maarit J. Korpi-Lagg

arXiv:2406.08923·cs.DC·May 28, 2025

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Johannes Pekkil\"a, Oskar Lappi, Fredrik Roberts\'en, Maarit J. Korpi-Lagg

PDF

TL;DR

This paper evaluates the performance and energy efficiency of stencil computations on modern AMD and Nvidia GPUs, proposing platform-specific tuning strategies to optimize their computational potential in high-performance computing tasks.

Contribution

It provides a comparative analysis of AMD and Nvidia GPUs for stencil computations and introduces a new tuning strategy for fusing cache-heavy kernels on these platforms.

Findings

01

AMD and Nvidia GPUs show key hardware and software differences.

02

Platform-specific tuning is necessary for optimal performance.

03

Proposed strategies improve energy efficiency and computational throughput.

Abstract

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent introduction of AMD-manufactured graphics processors to the world's fastest supercomputers, tuning strategies established for previous hardware generations must be re-evaluated. In this study, we evaluate the performance and energy efficiency of stencil computations on modern datacenter graphics processors, and propose a tuning strategy for fusing cache-heavy stencil kernels. The studied cases comprise both synthetic and practical applications, which involve the evaluation of linear and nonlinear stencil functions in one to three dimensions. Our experiments reveal that AMD and Nvidia graphics processors exhibit key differences in both hardware and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.