Hardware-Assisted Virtualization of Neural Processing Units for Cloud   Platforms

Yuqi Xue; Yiqi Liu; Lifeng Nai; Jian Huang

arXiv:2408.04104·cs.AR·September 16, 2024

Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms

Yuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang

PDF

Open Access

TL;DR

Neu10 introduces a comprehensive framework for virtualizing neural processing units in cloud platforms, enabling better resource sharing, cost efficiency, and improved ML inference performance through hardware and software innovations.

Contribution

It presents a novel holistic NPU virtualization framework with a flexible abstraction, resource allocator, and ISA extension, addressing key challenges in modern cloud NPU virtualization.

Findings

01

Up to 1.4× throughput improvement for ML inference

02

Tail latency reduced by up to 4.6×

03

NPU utilization increased by 1.2× on average

Abstract

Cloud platforms today have been deploying hardware accelerators like neural processing units (NPUs) for powering machine learning (ML) inference services. To maximize the resource utilization while ensuring reasonable quality of service, a natural approach is to virtualize NPUs for efficient resource sharing for multi-tenant ML services. However, virtualizing NPUs for modern cloud platforms is not easy. This is not only due to the lack of system abstraction support for NPU hardware, but also due to the lack of architectural and ISA support for enabling fine-grained dynamic operator scheduling for virtualized NPUs. We present Neu10, a holistic NPU virtualization framework. We investigate virtualization techniques for NPUs across the entire software and hardware stack. Neu10 consists of (1) a flexible NPU abstraction called vNPU, which enables fine-grained virtualization of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

Methodstravel james