Towards QoS-Aware and Resource-Efficient GPU Microservices Based on   Spatial Multitasking GPUs In Datacenters

Wei Zhang; Quan Chen; Kaihua Fu; Ningxin Zheng; Zhiyi Huang; Jingwen; Leng; Chao Li; Wenli Zheng; Minyi Guo

arXiv:2005.02088·cs.DC·May 6, 2020·1 cites

Towards QoS-Aware and Resource-Efficient GPU Microservices Based on Spatial Multitasking GPUs In Datacenters

Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen, Leng, Chao Li, Wenli Zheng, Minyi Guo

PDF

Open Access

TL;DR

This paper introduces Camelot, a runtime system for GPU microservices that optimizes resource utilization and QoS through a global communication mechanism and contention-aware resource policies, outperforming existing solutions.

Contribution

Camelot is the first system to address GPU microservice resource management considering contention and pipeline effects, improving peak load support and resource efficiency.

Findings

01

Supports up to 64.5% higher peak load with limited GPUs

02

Reduces resource usage by 35% at low load

03

Achieves 99th percentile latency targets

Abstract

While prior researches focus on CPU-based microservices, they are not applicable for GPU-based microservices due to the different contention patterns. It is challenging to optimize the resource utilization while guaranteeing the QoS for GPU microservices. We find that the overhead is caused by inter microservice communication, GPU resource contention and imbalanced throughput within microservice pipeline. We propose Camelot, a runtime system that manages GPU micorservices considering the above factors. In Camelot, a global memory-based communication mechanism enables onsite data sharing that significantly reduces the end-to-end latencies of user queries. We also propose two contention aware resource allocation policies that either maximize the peak supported service load or minimize the resource usage at low load while ensuring the required QoS. The two policies consider the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability