NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

TL;DR
This paper proposes NeuMMU, an architectural support for efficient address translation in neural processing units, enabling better memory management with minimal performance overhead, crucial for accelerating deep neural network computations.
Contribution
It introduces a tailored memory management unit (MMU) for NPUs, addressing limitations of GPU-centric schemes and supporting virtual-to-physical address translation.
Findings
NeuMMU incurs only 0.06% performance overhead.
Data-driven analysis highlights limitations of prior GPU-based address translation.
NeuMMU effectively supports address translation in NPU architectures.
Abstract
To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become first class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06% performance overhead.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Memory and Neural Computing
