Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight
Yuang Yan, Ian Karlin, Ryan Grant

TL;DR
This paper unveils the hidden command streams of NVIDIA's closed-source GPU driver, enabling detailed analysis of GPU behavior and performance for better optimization and hardware-software co-design.
Contribution
We recover complete hardware command streams from NVIDIA's closed-source driver using open-source kernel components, providing new insights into GPU command submission and performance.
Findings
Identified DMA submission modes and characterized their performance independently.
Showed that reduced launch overhead correlates with smaller command footprints.
Demonstrated command-level visibility improves understanding and optimization of GPU middleware.
Abstract
For NVIDIA GPUs, CUDA is the primary interface through which applications orchestrate GPU execution, yet much of the logic that realizes CUDA operations resides in NVIDIA's closed-source userspace driver. As a result, the translation from high-level CUDA APIs to low-level hardware commands remains opaque, limiting both software understanding and performance attribution. This paper makes that command path visible. We recover the hardware command streams emitted by NVIDIA's closed-source userspace driver with full integrity by leveraging the recently open-sourced kernel driver, instrumenting the memory-mapping path, and installing a hardware watchpoint on the userspace mapping of the GPU doorbell register. This lets us capture complete command submissions at the moment they are committed. Using this methodology, we present two case studies. For CUDA data movement, we identify the DMA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
