Improving Scalability with GPU-Aware Asynchronous Tasks
Jaemin Choi, David F. Richards, Laxmikant V. Kale

TL;DR
This paper enhances GPU-accelerated asynchronous task frameworks by integrating GPU-aware communication and advanced optimization techniques to improve scalability and GPU utilization, especially in strong scaling scenarios.
Contribution
It introduces GPU-aware communication into asynchronous tasks and applies kernel fusion and CUDA Graphs to reduce overheads, advancing scalability on GPU platforms.
Findings
Improved performance in Jacobi3D benchmark with GPU-aware communication.
Reduced synchronization overheads and increased GPU concurrency.
Enhanced scalability in strong scaling scenarios.
Abstract
Asynchronous tasks, when created with over-decomposition, enable automatic computation-communication overlap which can substantially improve performance and scalability. This is not only applicable to traditional CPU-based systems, but also to modern GPU-accelerated platforms. While the ability to hide communication behind computation can be highly effective in weak scaling scenarios, performance begins to suffer with smaller problem sizes or in strong scaling due to fine-grained overheads and reduced room for overlap. In this work, we integrate GPU-aware communication into asynchronous tasks in addition to computation-communication overlap, with the goal of reducing time spent in communication and further increasing GPU utilization. We demonstrate the performance impact of our approach using a proxy application that performs the Jacobi iterative method, Jacobi3D. In addition to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques
