Deep Learning based Data Prefetching in CPU-GPU Unified Virtual Memory
Xinjian Long, Xiangyang Gong, Huiyang Zhou

TL;DR
This paper introduces a deep learning-based page prefetching method for CPU-GPU unified virtual memory, using a Transformer model to significantly improve performance, memory hit rate, and reduce interconnect traffic.
Contribution
It demonstrates that Transformer models can effectively predict UVM page accesses and proposes a simplified, efficient model that outperforms existing prefetching schemes.
Findings
Performance improved by 10.89%
Memory page hit rate increased by 16.98%
Interconnect traffic reduced by 11.05%
Abstract
Unified Virtual Memory (UVM) relieves the developers from the onus of maintaining complex data structures and explicit data migration by enabling on-demand data movement between CPU memory and GPU memory. However, on-demand paging soon becomes a performance bottleneck of UVM due to the high latency caused by page table walks and data migration over interconnect. Prefetching is considered a promising solution to this problem given its ability to leverage the locality of program memory access patterns. However, existing locality-based prefetching schemes can not handle all the situations. %Data structures like arrays tend to be stored in contiguous blocks, and accessed repeatedly. An ideal prefetcher should not only look at narrow regions of the requested address space but also capture global context to deliver a good prediction of the memory access pattern. This paper proposes a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
