NeuMMU: Architectural Support for Efficient Address Translations in   Neural Processing Units

Bongjoon Hyun; Youngeun Kwon; Yujeong Choi; John Kim; Minsoo Rhu

arXiv:1911.06859·cs.AR·November 19, 2019·1 cites

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

PDF

Open Access

TL;DR

This paper proposes NeuMMU, an architectural support for efficient address translation in neural processing units, enabling better memory management with minimal performance overhead, crucial for accelerating deep neural network computations.

Contribution

It introduces a tailored memory management unit (MMU) for NPUs, addressing limitations of GPU-centric schemes and supporting virtual-to-physical address translation.

Findings

01

NeuMMU incurs only 0.06% performance overhead.

02

Data-driven analysis highlights limitations of prior GPU-based address translation.

03

NeuMMU effectively supports address translation in NPU architectures.

Abstract

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become first class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06% performance overhead.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Memory and Neural Computing