Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with   TLB Prefetching and MMU-Aware DMA Engine

Andreas Kurth; Pirmin Vogel; Andrea Marongiu; Luca Benini

arXiv:1808.09751·cs.AR·August 30, 2018

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Andreas Kurth, Pirmin Vogel, Andrea Marongiu, Luca Benini

PDF

TL;DR

This paper introduces a scalable virtual memory sharing approach for heterogeneous SoCs that reduces TLB misses with prefetching, handles misses efficiently, and enables parallel DMA transfers without extra buffers, significantly improving performance.

Contribution

It presents a novel SVM solution with compiler-guided prefetching, parallel miss handling, and hardware support for DMA, enhancing scalability and efficiency in heterogeneous SoCs.

Findings

01

Up to 4x performance improvement for memory-intensive kernels.

02

60% performance gain for irregular memory access patterns.

03

Effective TLB miss reduction and parallel DMA support.

Abstract

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.