In-Situ Assessment of Device-Side Compute Work for Dynamic Load Balancing in a GPU-Accelerated PIC Code
Michael E. Rowan, Axel Huebl, Kevin N. Gott, Jack Deslippe, Maxence, Th\'evenet, Remi Lehe, Jean-Luc Vay

TL;DR
This paper introduces GPU-specific strategies for assessing compute work to improve dynamic load balancing in GPU-accelerated PIC codes, significantly enhancing performance on large-scale GPU systems.
Contribution
It presents novel GPU-amenable methods for in-situ compute work assessment and demonstrates their effectiveness in optimizing load balancing for large-scale GPU applications.
Findings
Achieved 62-74% of theoretical maximum speedup on Summit with 6144 GPUs.
Improved load balancing yields 3.8x speedup over static methods on 96 GPUs.
Optimal data collection strategies for GPU compute work assessment identified.
Abstract
Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parallelism, we investigate the performance of the improved dynamic load balancing via a strong scaling-based performance model and show that, for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
