Performance Evaluation of Advanced Features in CUDA Unified Memory
Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

TL;DR
This paper evaluates advanced CUDA Unified Memory features, memory advises and asynchronous prefetch, across different platforms, revealing their varied impact on performance under in-memory and oversubscription scenarios.
Contribution
It provides a comprehensive benchmark-based analysis of new CUDA Unified Memory features on diverse hardware platforms, highlighting their performance benefits and limitations.
Findings
Memory advises improve oversubscription performance by up to 25% on Intel-Volta/Pascal.
On Power9-Volta-NVLink, memory advises yield up to 34% in-memory performance gains.
Prefetching enhances performance by up to 50% on Intel platform but has limited effect on Power9 platform.
Abstract
CUDA Unified Memory improves the GPU programmability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to evaluate both in-memory and oversubscription performance. The results show that memory advises on the Intel-Volta/Pascal-PCIe platform bring negligible improvement for in-memory executions. However, when GPU memory is oversubscribed by about 50%, using memory advises results in up to 25% performance improvement compared to the basic CUDA Unified Memory. In contrast, the Power9-Volta-NVLink platform can substantially benefit from memory advises, achieving up to 34% performance gain for in-memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
