Enabling predictable parallelism in single-GPU systems with persistent CUDA threads
Paolo Burgio

TL;DR
This paper introduces a method to achieve predictable parallelism in single-GPU systems by leveraging persistent CUDA threads, addressing the challenge of providing timing guarantees in real-time applications.
Contribution
It presents a novel approach that opens the GPU architecture for intra-GPU predictable execution, moving beyond treating the GPU as a monolithic device.
Findings
Enables predictable execution within modern GPU architectures.
Provides a framework for real-time guarantees in GPU computing.
Improves timing predictability for safety-critical applications.
Abstract
Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big interest in adopting them also within automotive and avionic industrial settings, imposing for the first time real-time constraints on the design of such devices. Unfortunately, it is extremely hard to extract timing guarantees from modern GPU designs, and current approaches rely on a model where the GPU is treated as a unique monolithic execution device. Unlike state-of-the-art of research, we try to open the box of modern GPU architectures, providing a clean way to exploit intra-GPU predictable execution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Real-Time Systems Scheduling · Embedded Systems Design Techniques
