Address Translation Design Tradeoffs for Heterogeneous Systems

Yunsung Kim; Guilherme Cox; Martha A. Kim; Abhishek Bhattacharjee

arXiv:1707.09450·cs.AR·August 1, 2017

Address Translation Design Tradeoffs for Heterogeneous Systems

Yunsung Kim, Guilherme Cox, Martha A. Kim, Abhishek Bhattacharjee

PDF

Open Access

TL;DR

This paper explores the design space of memory management units (MMUs) in heterogeneous systems, revealing that accelerators should have dedicated, application-specific MMUs for optimal performance and efficiency.

Contribution

It provides a comprehensive analysis of MMU design tradeoffs in heterogeneous systems, emphasizing the importance of independent, application-specific MMUs for accelerators.

Findings

01

Accelerators should not rely on CPU MMUs for address translation.

02

Small, standard TLBs can cause significant performance overhead.

03

Performance, area, and energy efficiency depend on workload-specific MMU component configurations.

Abstract

This paper presents a broad, pathfinding design space exploration of memory management units (MMUs) for heterogeneous systems. We consider a variety of designs, ranging from accelerators tightly coupled with CPUs (and using their MMUs) to fully independent accelerators that have their own MMUs. We find that regardless of the CPU-accelerator communication, accelerators should not rely on the CPU MMU for any aspect of address translation, and instead must have its own, local, fully-fledged MMU. That MMU, however, can and should be as application-specific as the accelerator itself, as our data indicates that even a 100% hit rate in a small, standard L1 Translation Lookaside Buffer (TLB) presents a substantial accelerator performance overhead. Furthermore, we isolate the benefits of individual MMU components (e.g., TLBs versus page table walkers) and discover that their relative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed systems and fault tolerance