Using Hierarchical Parallelism to Accelerate the Solution of Many Small   Partial Differential Equations

Jacob Merson; Mark S. Shephard

arXiv:2305.07030·cs.DC·May 15, 2023·1 cites

Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations

Jacob Merson, Mark S. Shephard

PDF

Open Access

TL;DR

This paper explores hierarchical parallelism techniques to enhance GPU performance in solving many small PDEs, comparing NVIDIA Multi-Process Service and Kokkos-based methods, with the latter showing superior results.

Contribution

It introduces and compares two novel hierarchical parallelism strategies for GPU acceleration of small PDEs, demonstrating significant performance improvements.

Findings

01

Kokkos hierarchical parallelism outperforms NVIDIA Multi-Process Service

02

Both methods improve GPU parallel performance

03

The second method yields the greatest performance gains

Abstract

This paper presents efforts to improve the hierarchical parallelism of a two scale simulation code. Two methods to improve the GPU parallel performance were developed and compared. The first used the NVIDIA Multi-Process Service and the second moved the entire sub-problem loop into a single kernel using Kokkos hierarchical parallelism and a PackedView data structure. Both approaches improved parallel performance with the second method providing the greatest improvements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques