Accelerating High-Order Stencils on GPUs
Ryuichi Sai, John Mellor-Crummey, Xiaozhu Meng, Mauricio Araya-Polo,, Jie Meng

TL;DR
This paper investigates high-order stencil computations on GPUs, focusing on seismic modeling, and presents optimized CUDA implementations that outperform existing proprietary solutions while maintaining portability.
Contribution
It provides a detailed analysis and handcrafted CUDA implementations for high-order stencils, addressing boundary conditions and performance optimization on GPUs.
Findings
Achieved twice the performance of a proprietary C/OpenACC code.
Demonstrated excellent performance portability across GPU architectures.
Provided insights into memory and data-fetching patterns for high-order stencils.
Abstract
Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such as those used for seismic modeling. Furthermore, coping with boundary conditions often requires different computational logic, which complicates efficient exploitation of the thread-level parallelism on GPUs. In this paper, we study high-order stencils and their unique characteristics on GPUs. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA and related boundary conditions. We evaluate their code shapes, memory hierarchy usage, data-fetching patterns, and other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
