Accelerating High-Order Stencils on GPUs

Ryuichi Sai; John Mellor-Crummey; Xiaozhu Meng; Mauricio Araya-Polo,; Jie Meng

arXiv:2009.04619·cs.DC·September 16, 2020·1 cites

Accelerating High-Order Stencils on GPUs

Ryuichi Sai, John Mellor-Crummey, Xiaozhu Meng, Mauricio Araya-Polo,, Jie Meng

PDF

Open Access

TL;DR

This paper investigates high-order stencil computations on GPUs, focusing on seismic modeling, and presents optimized CUDA implementations that outperform existing proprietary solutions while maintaining portability.

Contribution

It provides a detailed analysis and handcrafted CUDA implementations for high-order stencils, addressing boundary conditions and performance optimization on GPUs.

Findings

01

Achieved twice the performance of a proprietary C/OpenACC code.

02

Demonstrated excellent performance portability across GPU architectures.

03

Provided insights into memory and data-fetching patterns for high-order stencils.

Abstract

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such as those used for seismic modeling. Furthermore, coping with boundary conditions often requires different computational logic, which complicates efficient exploitation of the thread-level parallelism on GPUs. In this paper, we study high-order stencils and their unique characteristics on GPUs. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA and related boundary conditions. We evaluate their code shapes, memory hierarchy usage, data-fetching patterns, and other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems