Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations
Jacob Merson, Mark S. Shephard

TL;DR
This paper explores hierarchical parallelism techniques to enhance GPU performance in solving many small PDEs, comparing NVIDIA Multi-Process Service and Kokkos-based methods, with the latter showing superior results.
Contribution
It introduces and compares two novel hierarchical parallelism strategies for GPU acceleration of small PDEs, demonstrating significant performance improvements.
Findings
Kokkos hierarchical parallelism outperforms NVIDIA Multi-Process Service
Both methods improve GPU parallel performance
The second method yields the greatest performance gains
Abstract
This paper presents efforts to improve the hierarchical parallelism of a two scale simulation code. Two methods to improve the GPU parallel performance were developed and compared. The first used the NVIDIA Multi-Process Service and the second moved the entire sub-problem loop into a single kernel using Kokkos hierarchical parallelism and a PackedView data structure. Both approaches improved parallel performance with the second method providing the greatest improvements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
