A stencil-based implementation of Parareal in the C++ domain specific embedded language STELLA
Andrea Arteaga, Daniel Ruprecht, Rolf Krause

TL;DR
This paper implements the Parareal time-parallel method within the STELLA C++ DSL for stencil computations, combining spatial and temporal parallelism on CPU and GPU to improve performance for PDE solutions.
Contribution
It introduces an MPI-based Parareal implementation in STELLA, enabling combined spatial and temporal parallelism on CPU and GPU architectures.
Findings
Achieved significant speedup in PDE solving using combined parallelism.
Demonstrated efficiency of Parareal on CPU and GPU backends.
Analyzed energy-to-solution for different parallel configurations.
Abstract
In view of the rapid rise of the number of cores in modern supercomputers, time-parallel methods that introduce concurrency along the temporal axis are becoming increasingly popular. For the solution of time-dependent partial differential equations, these methods can add another direction for concurrency on top of spatial parallelization. The paper presents an implementation of the time-parallel Parareal method in a C++ domain specific language for stencil computations (STELLA). STELLA provides both an OpenMP and a CUDA backend for a shared memory parallelization, using the CPU or GPU inside a node for the spatial stencils. Here, we intertwine this node-wise spatial parallelism with the time-parallel Parareal. This is done by adding an MPI-based implementation of Parareal, which allows us to parallelize in time across nodes. The performance of Parareal with both backends is analyzed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
