Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc
Jacob Faibussowitsch, Mark F. Adams, Richard Tran Mills, Stefano, Zampini, Junchao Zhang

TL;DR
This paper introduces a unified asynchronous programming model for PETSc that enables efficient, scalable, and user-friendly integration of GPU streams, significantly improving performance in scientific computations.
Contribution
It presents a novel, unified asynchronous programming model for PETSc that addresses integration challenges of GPU streams in scientific libraries.
Findings
Broad performance improvements demonstrated
Enhanced ease of use for library developers
Effective latency hiding in GPU computations
Abstract
Leveraging Graphics Processing Units (GPUs) to accelerate scientific software has proven to be highly successful, but in order to extract more performance, GPU programmers must overcome the high latency costs associated with their use. One method of reducing or hiding this latency cost is to use asynchronous streams to issue commands to the GPU. While performant, the streams model is an invasive abstraction, and has therefore proven difficult to integrate into general-purpose libraries. In this work, we enumerate the difficulties specific to library authors in adopting streams, and present recent work on addressing them. Finally, we present a unified asynchronous programming model for use in the Portable, Extensible, Toolkit for Scientific Computation (PETSc) to overcome these challenges. The new model shows broad performance benefits while remaining ergonomic to the user.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
