Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms
Yuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang

TL;DR
Neu10 introduces a comprehensive framework for virtualizing neural processing units in cloud platforms, enabling better resource sharing, cost efficiency, and improved ML inference performance through hardware and software innovations.
Contribution
It presents a novel holistic NPU virtualization framework with a flexible abstraction, resource allocator, and ISA extension, addressing key challenges in modern cloud NPU virtualization.
Findings
Up to 1.4× throughput improvement for ML inference
Tail latency reduced by up to 4.6×
NPU utilization increased by 1.2× on average
Abstract
Cloud platforms today have been deploying hardware accelerators like neural processing units (NPUs) for powering machine learning (ML) inference services. To maximize the resource utilization while ensuring reasonable quality of service, a natural approach is to virtualize NPUs for efficient resource sharing for multi-tenant ML services. However, virtualizing NPUs for modern cloud platforms is not easy. This is not only due to the lack of system abstraction support for NPU hardware, but also due to the lack of architectural and ISA support for enabling fine-grained dynamic operator scheduling for virtualized NPUs. We present Neu10, a holistic NPU virtualization framework. We investigate virtualization techniques for NPUs across the entire software and hardware stack. Neu10 consists of (1) a flexible NPU abstraction called vNPU, which enables fine-grained virtualization of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methodstravel james
