TL;DR
This paper evaluates the performance impact of IOMMU-based shared virtual addressing in RISC-V embedded heterogeneous SoCs, demonstrating its viability for efficient data sharing and offloading in systems with caches.
Contribution
It provides a quantitative analysis of shared virtual addressing performance in RISC-V SoCs, integrating an IOMMU and evaluating on FPGA with benchmark kernels.
Findings
IO virtual address translation accounts for up to 17.6% of runtime without cache.
With last-level cache, translation overhead drops below 1%.
Shared virtual addressing is suitable for RISC-V heterogeneous SoCs with caches.
Abstract
Embedded heterogeneous systems-on-chip (SoCs) rely on domain-specific hardware accelerators to improve performance and energy efficiency. In particular, programmable multi-core accelerators feature a cluster of processing elements and tightly coupled scratchpad memories to balance performance, energy efficiency, and flexibility. In embedded systems running a general-purpose OS, accelerators access data via dedicated, physically addressed memory regions. This negatively impacts memory utilization and performance by requiring a copy from the virtual host address to the physical accelerator address space. Input-Output Memory Management Units (IOMMUs) overcome this limitation by allowing devices and hosts to use a shared virtual paged address space. However, resolving IO virtual addresses can be particularly costly on high-latency memory systems as it requires up to three sequential memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
