PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Yunseong Kim, Yujeong Choi, Minsoo Rhu

TL;DR
This paper introduces PARIS and ELSA, a reconfigurable multi-GPU inference system that optimizes GPU partitioning and scheduling to improve latency and utilization in cloud ML inference servers.
Contribution
It proposes a novel GPU partitioning algorithm and an elastic scheduling method specifically designed for reconfigurable GPU architectures in inference servers.
Findings
Achieves high resource utilization with low latency.
Effectively balances throughput and latency in multi-GPU inference.
Demonstrates significant performance improvements over traditional setups.
Abstract
In cloud machine learning (ML) inference systems, providing low latency to end-users is of utmost importance. However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the total-cost-of-ownership. GPUs have oftentimes been criticized for ML inference usages as its massive compute and memory throughput is hard to be fully utilized under low-batch inference scenarios. To address such limitation, NVIDIA's recently announced Ampere GPU architecture provides features to "reconfigure" one large, monolithic GPU into multiple smaller "GPU partitions". Such feature provides cloud ML service providers the ability to utilize the reconfigurable GPU not only for large-batch training but also for small-batch inference with the potential to achieve high resource utilization. In this paper, we study this emerging GPU architecture with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Graph Theory and Algorithms
Methodstravel james
