From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels
Gregor Dai{\ss}, Patrick Diehl, Dominic Marcello, Alireza, Kheirkhahan, Hartmut Kaiser, Dirk Pfl\"uger

TL;DR
This paper explores strategies for aggregating fine-grained CPU tasks into larger GPU kernels in Octo-Tiger, an astrophysics simulation, to improve GPU resource utilization and performance portability across hardware.
Contribution
It introduces a new GPU work aggregation strategy and evaluates multiple approaches to enhance GPU performance in adaptive astrophysics simulations.
Findings
Achieved noticeable speedups on AMD and NVIDIA GPUs.
Demonstrated the effectiveness of work aggregation for GPU performance.
Improved scalability and resource utilization in Octo-Tiger.
Abstract
Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially for adaptively refined ones. In Octo-Tiger, an astrophysics application for the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks to easily distribute work and finely overlap communication and computation. For the computations themselves, we use Kokkos to turn these tasks into compute kernels capable of running on hardware ranging from a few CPU cores to powerful accelerators. There is a missing link, however: while the fine-grained parallelism exposed by HPX is useful for scalability, it can hinder GPU performance when the tasks become too small to saturate the device, causing low resource utilization. To bridge this gap, we investigate multiple different GPU work aggregation strategies within Octo-Tiger,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
