Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
Andreas Kurth, Pirmin Vogel, Andrea Marongiu, Luca Benini

TL;DR
This paper introduces a scalable virtual memory sharing approach for heterogeneous SoCs that reduces TLB misses with prefetching, handles misses efficiently, and enables parallel DMA transfers without extra buffers, significantly improving performance.
Contribution
It presents a novel SVM solution with compiler-guided prefetching, parallel miss handling, and hardware support for DMA, enhancing scalability and efficiency in heterogeneous SoCs.
Findings
Up to 4x performance improvement for memory-intensive kernels.
60% performance gain for irregular memory access patterns.
Effective TLB miss reduction and parallel DMA support.
Abstract
Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
