PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible   Neural Processing Units

Yujeong Choi; Minsoo Rhu

arXiv:1909.04548·cs.DC·September 11, 2019

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

Yujeong Choi, Minsoo Rhu

PDF

1 Repo

TL;DR

This paper introduces PREMA, a predictive multi-task scheduling algorithm for preemptible NPUs that significantly improves latency, throughput, and SLA satisfaction in cloud-based DNN acceleration.

Contribution

It proposes a novel preemptible NPU design combined with a predictive scheduler to enhance resource sharing and performance in cloud DNN services.

Findings

01

Preemptive NPUs reduce latency by 7.8x.

02

The scheduler improves throughput by 1.4x.

03

SLA satisfaction increases by 4.8x.

Abstract

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8x, 1.4x, and 4.8x improvement in latency, throughput, and SLA satisfaction, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agongee/prema_sim
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.