A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures
Ryuichi Sai, John Mellor-Crummey, Jinfan Xu, Mauricio Araya-Polo

TL;DR
StencilPy is a high-level framework that simplifies writing portable, high-performance stencil computations across diverse modern architectures, achieving performance comparable to hand-optimized code with significantly improved productivity.
Contribution
It introduces StencilPy, a domain-specific language and backend system that enables efficient, portable stencil computations on a wide range of current and emerging hardware architectures.
Findings
StencilPy achieves performance comparable to hand-written code.
It reduces code length significantly compared to manual implementations.
Provides cross-architecture portability and high productivity.
Abstract
Finite-difference methods based on high-order stencils are widely used in seismic simulations, weather forecasting, computational fluid dynamics, and other scientific applications. Achieving HPC-level stencil computations on one architecture is challenging, porting to other architectures without sacrificing performance requires significant effort, especially in this golden age of many distinctive architectures. To help developers achieve performance, portability, and productivity with stencil computations, we developed StencilPy. With StencilPy, developers write stencil computations in a high-level domain-specific language, which promotes productivity, while its backends generate efficient code for existing and emerging architectures, including modern many-core CPUs (such as AMD Genoa-X, Fujitsu A64FX, and Intel Sapphire Rapids), latest generations of GPUs (including NVIDIA H100 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
